NAVAL 

POSTGRADUATE 

SCHOOL 

MONTEREY,  CALIFORNIA 


THESIS 


INFRARED  IMAGING  FACE  RECOGNITION  USING 
NONLINEAR  KERNEL-BASED  CLASSIFIERS 

by 

Dimitrios  I.  Domboulas 
December  2004 

Thesis  Committee  Supervisor:  Monique  P.  Fargues 

Thesis  Committee  Members:  Roberto  Cristi 

Gamani  Karunasiri 


Approved  for  public  release;  distribution  is  unlimited 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


|  REPORT  DOCUMENTATION  PAGE 

Form  Approved  OMB  No.  0704-0188  | 

Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including 
the  time  for  reviewing  instruction,  searching  existing  data  sources,  gathering  and  maintaining  the  data  needed,  and 
completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any 
other  aspect  of  this  collection  of  information,  including  suggestions  for  reducing  this  burden,  to  Washington 
headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite 
1204,  Arlington,  VA  22202^1302,  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project 
(0704-0188)  Washington  DC  20503. 

1.  AGENCY  USE  ONLY  (Leave  blank)  2.  REPORT  DATE  3.  REPORT  TYPE  AND  DATES  COVERED 

December  2004  Engineer’s  Thesis 

4.  TITLE  AND  SUBTITLE:  Infrared  Imaging  Face  Recognition  Using  Nonlinear 
Kernel-Based  Classifiers 

5.  FUNDING  NUMBERS 

6.  AUTHOR(S)  Dimitrios  I.  Domboulas 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Naval  Postgraduate  School 

Monterey,  CA  93943-5000 

8.  PERFORMING 

ORGANIZATION  REPORT 
NUMBER 

9.  SPONSORING  /MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

N/A 

10.  SPONSORING/MONITORING 
AGENCY  REPORT  NUMBER 

11.  SUPPLEMENTARY  NOTES  The  views  expressed  in  this  thesis  are  those  of  the  author  and  do  not  reflect  the  official 
policy  or  position  of  the  Department  of  Defense  or  the  U.S.  Government. 

12a.  DISTRIBUTION  /  AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  is  unlimited 

12b.  DISTRIBUTION  CODE 

13.  ABSTRACT  (maximum  200  words) 

In  recent  years  there  has  been  an  increased  interest  in  effective  individual  control  and  enhanced  security  measures, 
and  face  recognition  schemes  play  an  important  role  in  this  increasing  market.  In  the  past,  most  face  recognition  research  stud¬ 
ies  have  been  conducted  with  visible  imaging  data.  Only  recently  have  IR  imaging  face  recognition  studies  been  reported  for 
wide  use  applications,  as  uncooled  IR  imaging  technology  has  improved  to  the  point  where  the  resolution  of  these  much 
cheaper  cameras  closely  approaches  that  of  cooled  counterparts.  This  study  is  part  of  an  on-going  research  conducted  at  the 

Naval  Postgraduate  School  which  investigates  the  feasibility  of  applying  a  low  cost  uncooled  IR  camera  for  face  recognition 
applications.  This  specific  study  investigates  whether  nonlinear  kernel-based  classifiers  may  improve  overall  classification 
rates  over  those  obtained  with  linear  classification  schemes.  The  study  is  applied  to  a  50  subject  IR  database  developed  in 
house  with  a  low  resolution  uncooled  IR  camera.  Results  show  best  overall  mean  classification  performances  around  98.55% 
which  represents  a  5%  performance  improvement  over  the  best  linear  classifier  results  obtained  previously  on  the  same  data¬ 
base.  This  study  also  considers  several  metrics  to  evaluate  the  impacts  variations  in  various  user-specified  parameters  have  on 
the  resulting  classification  performances.  These  results  show  that  a  low-cost,  low-resolution  IR  camera  combined  with  an  effi¬ 
cient  classifier  can  play  an  effective  role  in  security  related  applications. 

14.  SUBJECT  TERMS 

Face  Recognition,  Pattern  Classification,  Infrared,  GDA,  Distances,  Eigenvectors 

15.  NUMBER  OF 
PAGES 

129 

16.  PRICE  CODE 

17.  SECURITY 
CLASSIFICATION  OF 
REPORT 

Unclassified 

18.  SECURITY 

CLASSIFICATION  OF  THIS 
PAGE 

Unclassified 

19.  SECURITY 
CLASSIFICATION  OF 
ABSTRACT 

Unclassified 

20.  LIMITATION 

OF  ABSTRACT 

UL 

NSN  7540-01-280-5500  Standard  Form  298  (Rev.  2-89) 


Prescribed  by  ANSI  Std.  239-18 


1 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


11 


Approved  for  public  release;  distribution  is  unlimited 

INFRARED  IMAGING  FACE  RECOGNITION  USING  NONLINEAR  KERNEL- 

BASED  CLASSIFIERS 

Dimitrios  I.  Domboulas 
Captain,  Hellenic  Air  Force 

B.S.,  in  Electronic  Engineering,  Hellenic  Air  Force  Academy,  Athens,  1992 

Submitted  in  partial  fulfillment  of  the 
requirements  for  the  degrees  of 

ELECTRICAL  ENGINEER 

and 

MASTER  OF  SCIENCE  IN  ELECTRICAL  ENGINEERING 

from  the 

NAVAL  POSTGRADUATE  SCHOOL 
December  2004 


Author:  Dimitrios  I.  Domboulas 


Approved  by:  Monique  P.  Fargues 

Thesis  Committee  Supervisor 


Roberto  Cristi 

Thesis  Committee  Member 


Gamani  Karunasiri 
Thesis  Committee  Member 


John  P.  Powers 

Chairman,  Department  of  Electrical  and  Computer  Engineering 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


IV 


ABSTRACT 


In  recent  years  there  has  been  an  increased  interest  in  effective  individual  control 
and  enhanced  security  measures,  and  face  recognition  schemes  play  an  important  role  in 
this  increasing  market.  In  the  past,  most  face  recognition  research  studies  have  been  con¬ 
ducted  with  visible  imaging  data.  Only  recently  have  IR  imaging  face  recognition  studies 
been  reported  for  wide  use  applications,  as  uncooled  IR  imaging  technology  has  im¬ 
proved  to  the  point  where  the  resolution  of  these  much  cheaper  cameras  closely  ap¬ 
proaches  that  of  cooled  counterparts.  This  study  is  part  of  an  on-going  research  con¬ 
ducted  at  the  Naval  Postgraduate  School  which  investigates  the  feasibility  of  applying  a 
low  cost  uncooled  IR  camera  for  face  recognition  applications.  This  specific  study  inves¬ 
tigates  whether  nonlinear  kernel-based  classifiers  may  improve  overall  classification 
rates  over  those  obtained  with  linear  classification  schemes.  The  study  is  applied  to  a  50 
subject  IR  database  developed  in  house  with  a  low  resolution  uncooled  IR  camera.  Re¬ 
sults  show  best  overall  mean  classification  performances  around  98.55%,  which  repre¬ 
sents  a  5%  performance  improvement  over  the  best  linear  classifier  results  obtained  pre¬ 
viously  on  the  same  database.  This  study  also  considers  several  metrics  to  evaluate  the 
impacts  variations  in  various  user-specified  parameters  have  on  the  resulting  classifica¬ 
tion  performances.  These  results  show  that  a  low-cost,  low-resolution  IR  camera  com¬ 
bined  with  an  efficient  classifier  can  play  an  effective  role  in  security  related  applications 
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EXECUTIVE  SUMMARY 


Numerous  face  recognition  studies  have  been  reported  in  the  literature  over  the 
years  due  to  their  increasing  usage  in  security,  access  control,  biometric  and  other  types 
of  applications.  In  the  past  most  research  studies  have  been  conducted  in  visible  imaging 
data  and  only  recently  have  IR  imaging  face  recognition  studies  been  reported.  IR  imag¬ 
ing  offers  the  main  advantage  over  visible  imaging  of  being  invariant  to  illumination 
changes.  Good  resolution  IR  imaging  has  been  restricted  to  very  expensive  cooled  de¬ 
vices  which  slowed  the  interest  in  IR  face  recognition  down  until  the  recent  past.  How¬ 
ever,  technological  advances  have  increased  the  resolution  of  uncooled  IR  cameras  to  the 
point  that  their  resolution  closely  approaches  that  of  cooled  devices  at  a  fraction  of  the 
cost.  This  study  extends  research  conducted  earlier  by  Pereira  [Pereira,  2002]  and  Lee 
[Lee,  2004],  who  collected  an  infrared  (IR)  face  database  using  a  low  resolution  IR  cam¬ 
era  and  considered  linear  classifier  approaches.  This  study  explores  the  performance  of  a 
kernel-based  nonlinear  pattern  classification  algorithm  applied  to  the  same  50  subject 
database  used  by  Lee. 

The  nonlinear  kernel-based  Generalized  Discriminant  Analysis  (GDA)  consid¬ 
ered  here  is  basically  an  extension  of  the  Linear  Discriminant  Analysis  (LDA),  where  the 
data  is  implicitly  projected  into  a  new  higher  dimensional  nonlinear  transformed  domain 
prior  to  the  application  of  the  LDA  step.  As  a  result,  the  GDA  approach  takes  advantages 
of  the  nonlinear  characteristics  of  data,  which  LDA  does  not  have  access  to  for  higher 
classification  rates. 

First,  this  study  investigates  several  distance  measures  and  selects  the  “best”  one 
to  apply  to  the  data  under  investigation.  It  also  considers  various  user-specified  parame¬ 
ters  such  as  the  specific  type  of  kernel  function,  the  number  of  eigenvectors  selected  in 
the  GDA  step  and  investigates  their  resulting  impacts  on  overall  mean  classification  per¬ 
formances.  As  part  of  the  study,  the  concept  of  distance  maps  is  defined  to  visually 
evaluate  and  compare  classifier  performances,  and  apply  concepts  of  confidence  inter¬ 
vals,  confidence  scores,  and  rank-scores  to  assist  the  user  in  designing  the  “best”  classi¬ 
fier. 


xv 


The  study  implements  a  cross-validation  variant  to  insure  results  derived  are  sta¬ 
tistically  meaningful,  and  software  implementation  steps  are  vectorized  to  minimize  the 
associated  computational  load.  Results  show  that  the  GDA  implementation,  when  select¬ 
ing  the  Mahalanobis  Angular  distance,  a  polynomial  kernel  of  degree  2  and  the  500  top 
eigenvectors  for  the  GDA  projection  step,  provides  the  best  mean  classification  results 
(98.55%).  These  results  correspond  to  a  5%  improvement  over  the  best  derived  with  the 
linear  Fisherface  implementation,  and  show  that  significant  improvement  may  be  gained 
by  implementing  nonlinear  kernel-based  schemes.  Results  also  show  that  a  low-cost, 
low-resolution  IR  camera  combined  with  an  efficient  classifier  can  play  an  effective  role 
in  security  related  applications. 


I. 


INTRODUCTION 


Numerous  face  recognition  studies  have  been  reported  in  the  literature  [Adini, 
1997;  Hearst,  1998;  Liu,  April  2004]  over  the  years  due  to  their  increasing  usage  in  secu¬ 
rity,  access  control,  biometric  and  other  types  of  applications.  In  the  past,  most  research 
studies  have  been  conducted  in  visible  imaging  data  and  only  recently  have  infrared  (IR) 
imaging  face  recognition  studies  been  reported  [Pereira,  2002;  Chen,  2003;  Socolinsky, 
2003;  Lee,  2004],  IR  imaging  offers  the  main  advantage  over  visible  imaging  of  being 
invariant  to  illumination  changes  [Chen,  2003],  Until  the  recent  past,  good  resolution  IR 
imaging  was  restricted  to  very  expensive  cooled  cameras  which  lowered  the  interest  in  IR 
face  recognition.  However,  technological  advances  have  increased  the  resolution  of  un¬ 
cooled  IR  cameras  to  the  point  that  their  resolution  closely  approaches  that  of  cooled 
cameras  at  a  fraction  of  the  cost. 

This  study  is  an  extension  to  the  research  conducted  earlier  by  Pereira  [Pereira, 
2002]  and  Lee  [Lee,  2004],  First,  Pereira  initialized  the  study.  He  developed  an  uncooled 
IR  image  capture  system,  generated  a  small  database  of  14  adult  subjects,  collected  in  a 
controlled  indoor  environment,  and  implemented  two  classic  linear  classification  algo¬ 
rithms  using  Matlab,  the  Principal  Component  Analysis  (PCA),  and  the  Fisherface 
[Pereira,  2002],  Later  Lee  extended  the  database  to  50  adult  subjects  and  applied  the 
same  two  linear  classification  algorithms  [Lee,  2004],  This  study  extends  the  previous 
research  to  a  kernel-based  nonlinear  classification  approach,  the  Generalized  Discrimi¬ 
nant  Algorithm,  (GDA)  and  compares  classification  performances  with  those  obtained 
using  linear  implementations. 

The  following  sections  briefly  reviews  the  concepts  of  IR  radiation,  followed  by  a 
presentation  of  the  procedures  used  for  image  collection. 

A.  THEORETICAL  BACKGROUND 

The  IR  radiation  is  the  section  of  the  electromagnetic  spectrum  whose  wave¬ 
lengths  lie  primarily  between  1-100  pm.  This  wavelength  range  is  above  the  visible 
spectrum  range  and  below  the  microwave  range  [Dereniak,  1996],  Indicatively,  Fig.  1.1 
illustrates  the  spectrum,  marking  the  ultraviolet,  visible  and  IR  regions. 
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Figure  1.1.  Visible  Spectrum  versus  the  Ultraviolet  and  the  IR  Ranges  (From  [Sierra 

Pacific  Corp.,  2004].). 


Though  the  IR  radiation  is  not  visible,  it  can  be  perceived  in  the  form  of  thermal 
energy,  as  all  objects  which  have  temperatures  greater  than  0  K  emit  thermal  radiation. 
The  amount  of  thermal  energy  an  object  emits  depends  on  both  the  fourth  power  of  its 
absolute  temperature  and  its  emissivity.  Emissivity  of  an  object  indicates  to  which  degree 
it  emits  thermal  radiation,  and  it  takes  a  value  between  0  and  1 .  The  larger  the  emissivity 
of  an  object  is,  the  larger  the  thermal  radiation  it  emits,  and  is  usually  a  characteristic  de¬ 
termined  by  the  material  structure  of  the  object’s  surface  (i.e.,  its  molecular  characteris¬ 
tics).  A  blackbody  is  defined  to  be  the  object  that  has  emissivity  value  equal  to  1.  This 
ideal  case  of  an  object  does  not  exist  in  nature. 

The  human  body  has  an  emissivity  of  about  0.98,  at  310  K  [Jones,  1998],  Addi¬ 
tionally,  each  individual  has  a  vein  and  tissue  structure  which  is  peculiar  to  him/her,  re¬ 
sulting  in  the  unique  emission  of  a  thermal  radiation  pattern.  The  temperature  variation 
across  the  face  of  a  typical  person  is  about  7°  C.  Since  the  spatial  variation  of  one’s  facial 
temperature  profile  is  not  very  abrupt,  the  low  spatial  resolution  of  the  IR  cameras  gener¬ 
ally  should  not  affect  the  quality  of  the  images.  For  this  reason,  the  IR  radiation  emitted 
by  the  human  face  is  ideal  for  high  classification  performance  scores  in  face  recognition 
applications. 

An  additional  advantage  of  using  IR  images  as  compared  to  visible  images  for 
face  recognition  applications  is  that  IR  images  are  not  affected  by  variations  in  the  light¬ 
ing  of  a  particular  environment  while  visible  images  tend  to  lose  much  of  their  informa¬ 
tion  in  poorly  lit  conditions  [Chen,  2003],  However,  apart  from  the  low  resolution,  the 
main  drawback  of  using  IR  images  is  the  relatively  high  cost  of  IR  cameras  (purchase  and 
maintenance).  This  additional  cost  is  due  to  the  fact  that  cooled  IR  cameras  require  cryo- 
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genic  cooling  in  order  to  achieve  high  temperature  resolution.  In  order  to  make  such  im¬ 
agery  more  cost-effective,  recent  advances  in  uncooled  infrared  camera  technology  pro¬ 
vide  a  less  expensive  means  of  acquiring  infrared  images. 

Apart  from  face  recognition  applications,  IR  imaging  has  also  been  successfully 
used  in  many  civilian  and  military  applications  such  as  Forward  Looking  Infrared  (FLIR) 
systems  for  fighter  aircrafts  (F-14,  F-16,  F-18,  etc.),  helicopters,  and  IR  targeting  sys- 
tems-missiles  which  use  IR  radiation  to  locate  the  targets  [Sierra  Pacific  Corp.,  2004], 

B.  EQUIPMENT  USED 

The  IR  face  images  were  collected  at  the  Naval  Postgraduate  School  (NPS),  using 
an  uncooled  IR  camera  (IR  160),  manufactured  by  Infrared  Solutions. 

Even  though  the  spatial  resolution  (160x120  pixels)  and  temperature  resolution 
(60  mK)  provided  by  the  camera  are  relatively  low,  the  wavelength  region  in  which  it  op¬ 
erates  (8-14  pm)  is  critical  for  capturing  wavelengths  emitted  by  human  body  at  about 
310  K  (which  corresponds  to  peak  wavelength  of  10  pm).  This  phenomenon  is  illustrated 
in  Fig.  1.2. 


Wavelength  (pm) 


Figure  1 .2.  Blackbody  spectrum  for  temperatures  near  that  of  the  human  body  (From 

[Pereira,  2002].). 
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The  camera  used  here  is  said  to  be  uncooled  because  there  is  no  requirement  to 
cool  the  camera’s  sensor,  which  is  a  requirement  for  ordinary  cooled  cameras.  The  de¬ 
tailed  specifications  of  the  camera  can  be  found  in  [Infrared  Solutions  Inc.,  2004], 

The  IR-160  is  connected  to  a  monitor  to  allow  for  previewing  the  IR  images  be¬ 
fore  their  capture.  The  camera’s  output  produces  IR  images  in  a  digital  format.  The  digi¬ 
tal  image  is  then  transmitted  to  a  PC  via  an  8-bit,  RS-232  serial  port.  Next,  the  collected 
images  are  cropped  [Lee,  2004],  reshaped  in  a  column-vector  format,  and  inserted  in  the 
database  matrix  (this  process  will  be  discussed  further  in  Chapter  IV). 

The  equipment  used  in  the  IR  database  collection  is  illustrated  in  Fig.  1.3.  The  de¬ 
tailed,  geometric  layout  of  the  exact  location  of  the  individual  posing  for  capturing  his 
picture,  his/her  gazing  directions  towards  marked  spots  on  the  facing  wall,  and  the  loca¬ 
tion  of  the  pieces  of  equipment  used  (IR  camera,  monitor,  PC)  are  shown  in  Figs.  1 .4  and 
1.5. 

The  nine  spots  on  the  wall,  (forming  a  square  with  the  center  marked  on  it),  corre¬ 
spond  to  the  gazing  positions  of  the  individual,  having  a  specific  facial  expression  (neu¬ 
tral,  smiling,  or  pronouncing  the  vowel  “u”),  and  for  which  images  were  captured.  In  ad¬ 
dition,  a  tenth  image  was  collected  with  random  gazing  position,  located  within  the  de¬ 
fined  square.  Each  group  of  ten  images  constitutes  one  out  of  three  possible  sections  (fa¬ 
cial  expressions)  collected  for  a  given  subject  (class).  Additional  details  regarding  the 
organization  of  the  IR  image  collection  can  be  found  in  [Lee,  2004], 
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Figure  1.3.  Equipment  components  used  for  IR  data  collection  (From  [Lee,  2004].). 


Figure  1.5.  Forward  aspect  of  IR  data  collection  system  layout  (From  [Pereira, 

2002].). 
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C.  THESIS  OVERVIEW 

This  thesis  consists  of  four  chapters.  Chapter  I  presents  the  basic  idea  behind  the 
work  conducted.  Chapter  II  reviews  the  concepts  of  linear  classifiers.  Extensions  to 
nonlinear  classification  schemes  are  described  in  Chapter  III.  Next,  Chapter  IV  presents 
the  experiments  conducted  on  the  face  database  and  discusses  results  obtained.  Finally, 
conclusions  and  recommendations  for  further  research  are  presented  in  Chapter  V. 
Mathematical  details  can  be  found  in  Appendix  A.  The  software  implemented  during  the 
study  is  included  in  Appendix  B.  Last,  Appendix  C  includes  typical  performance  meas¬ 
urement  plots. 

D.  SUMMARY 

This  chapter  presented  the  principles  of  IR  imaging  for  face  recognition  applica¬ 
tions.  It  also  discussed  some  of  the  advantages  of  using  IR  over  visible  imaging  and  it 
presented  some  of  the  technical  characteristics  of  the  uncooled,  IR-160  camera  used  in 
this  study,  together  with  some  characteristics  of  the  equipment  used  for  the  IR  data  col¬ 
lection.  The  following  chapter  presents  the  linear  classification  algorithms  considered  for 
benchmarking  purposes  against  the  nonlinear  kernel  based  classifier  approach,  which  is 
the  main  focus  of  this  thesis. 
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II.  LINEAR  CLASSIFICATION  ALGORITHMS 


This  chapter  first  presents  a  brief  introduction  to  two  classical  linear  classification 
schemes  widely  used  in  the  pattern  recognition  community,  the  Principal  Component 
Analysis  (PCA)  and  the  Linear  Discriminant  Analysis  (LDA).  Next,  it  presents  a  brief 
comparison  between  the  two  schemes. 

A.  INTRODUCTION 

The  PCA  and  the  LDA  schemes  have  been  used  extensively  and  successfully  in 
applications  for  image  and  speech  recognition,  machine  learning,  financial  planning,  and 
various  other  scientific  areas  [Martinez,  2001].  The  database  considered  in  the  present 
face  recognition  study  includes  50  adult  subjects  (i.e.,  classes),  each  having  thirty  infrared 
images.  Identical  samples  and  images  were  used  by  Lee  in  a  study  conducted  in  2004 
[Lee,  2004],  Two  main  types  of  class  assignments  are  typically  available  in  pattern  rec¬ 
ognition  applications,  namely  the  “closed  set”  and  the  “open  set”  implementations. 
Whereas  the  closed  set  implementation  assumes  that  all  testing  trials  belong  to  the  data¬ 
base  used  to  design  the  classification  algorithm,  no  such  assumption  is  made  in  the  open 
set  implementation.  As  a  result,  testing  data  trials  may  not  necessarily  be  assigned  to  a 
given  class  in  open  set  implementations.  The  present  study  was  restricted  to  the  closed  set 
implementation  type,  i.e.,  all  individuals  tested  were  considered  to  be  in  the  training  da¬ 
tabase. 

B.  PRINCIPAL  COMPONENT  ANALYSIS  (PCA)  -  EIGENFACE  IMPLE¬ 
MENTATION 

1.  Introduction 

Karl  Pearson  first  introduced  the  Principal  Component  Analysis  in  1901  [Lee, 
2004],  The  PCA  method  was  created  initially  for  data  compression,  but  has  also  been  ex¬ 
tensively  used  with  success  in  pattern  recognition  applications.  The  basic  idea  behind 
PCA  is  to  represent  features  extracted  from  the  data  in  terms  of  the  linearly  independent 
eigenvectors  obtained  from  the  data’s  covariance  matrix.  Dimension  reduction  is  ob¬ 
tained  by  approximating  the  feature  vectors  using  a  subset  selected  from  the  top  eigen¬ 
vectors  (i.e.,  those  associated  with  the  eigenvalues  with  larger  magnitude),  thereby  pre¬ 
serving  most  of  the  feature  variance  information.  The  PCA  approach  has  been  used  ex- 
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tensively  in  classification  applications  even  though  it  is,  theoretically,  not  well-suited  to 
deal  with  such  problems  because  the  ability  to  discriminate  between  individuals  may  of¬ 
ten  be  contained  in  the  small  details  (i.e.,  associated  with  eigenvectors  with  smaller  mag¬ 
nitudes). 

2.  Algorithm  Description 

The  PCA  algorithm  (also  known  as  “Eigenspace  Projection”,  or  “Karhunen  Lo- 
eve”  decomposition)  seeks  a  linear  projection  direction  which  best  represents  the  data  in 
a  norm  sense  [Fargues,  2001],  Research  by  [Yambor,  2000]  presents  a  good  illustration 
of  this  process  and  is  discussed  next.  Recall  that  all  available  P  images  are  stored  in  ma¬ 
trices  of  dimension  ( m  x  n).  First,  these  images  are  reshaped  into  a  set  of  column  vectors 

jx'j  |  of  length  N  =  mn.  Next,  the  reshaped  vectors  are  stacked  column-wise,  to  form 


the  ( N  x  P)  dimensional  training  image  data  matrix  X  =  ^x1 
all  mean  image  of  length  N  is  given  by: 

1  V,f  i 

m  =  —y  x , 

p  t—li=\  ’ 


As  a  result,  the  over- 


(2.1) 


which  leads  to  the  corresponding  ith  reshaped  image  vector,  defined  by  the  equation: 


x'  =  x'  - m  =  [x1,x2,x3,...,xAr]r . 


(2.2) 


As  a  result,  the  centered  ( N  x  P)  dimensional  training  data  matrix  X  is  given  by 

X  =  [x1 1  x2 1  x3 1 ...  |  xp],  (2.3) 

where  the  vertical  bars  denote  that  the  column  vectors  jx'  j  ^  p  are  appended  in  se¬ 
quence,  in  order  to  form  the  matrix  X.  The  N-dimensional  data  covariance  matrix  is 
given  by: 

Q  =  XXT .  (2.4) 

Note  that  the  covariance  matrix,  Q,  has  up  to  min(P,  N )  non  zero  eigenvalues.  Next,  also 
recall  that  the  eigenvector  associated  with  the  largest  eigenvalue  is  that  associated  to  the 
direction  with  the  highest  energy,  and  so  on.  Therefore,  the  PCA  projection  matrix  V  is 
made  up  from  the  top  K  eigenvectors  vK,  where  K  is  user-specified,  as: 
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(2.5) 


F  =  [v1|v2|....|vJf]. 

As  a  result,  the  projected  centered  data  S'  can  be  expressed  as: 

S'  =  VTX.  (2.6) 

Next,  the  projected  data  is  used  to  compute  the  projected  “class-centroids”  as  the 
means  of  the  projected  data  associated  with  each  class  resulting  in  a  class-centroid  matrix 
of  dimension  ( K  x  C),  where  C  is  the  number  of  classes.  Lastly,  testing  images  are  pro¬ 
jected  onto  the  smaller  dimensional  space  using  the  projection  matrix,  V,  defined  with  the 
training  data.  The  classification  decision  is  obtained  by  selecting  as  the  class  that  one 
which  is  closest  in  norm  to  the  projected  testing  image.  Various  types  of  norms  may  be 
applied,  and  in  most  cases,  the  simple  Euclidean  distance  (Norm-2)  is  selected. 

According  to  [Belhumeur,  1997],  the  PCA  method  yields  projection  directions 
that  maximize  the  total  scatter  across  all  images.  In  order  to  find  these  projections,  the 
training  data  covariance  matrix  has  to  be  calculated  and  its  eigenvalue-eigenvector  de¬ 
composition  performed.  Eigenvectors  associated  with  the  top  eigenvalues  in  magnitude 
constitute  the  projections’  directions  mentioned  above.  However,  PCA  still  retains  un¬ 
wanted  information,  e.g.,  that  due  to  facial  expressions  and  lighting.  According  to  a  study 
reported  in  [Adini,  1997],  intra-class  variations  that  were  generated  by  altering  the  direc¬ 
tion  of  one’s  gaze  and  lighting  conditions  may  be  larger  than  inter-class  variations.  As 
discussed  in  [Pereira,  2004],  classification  errors  may  be  reduced  by  discarding  the  first 
few  top  eigenvectors  when  dealing  with  a  small  IR  database.  However,  Lee  showed  that 
such  a  trend  does  not  extend  as  the  database  size  [Lee,  2004],  Nevertheless,  it  remains 
true  that  PCA  is  not  optimally  designed  to  discriminate  between  classes.  However,  PCA 
can  help  to  reduce  problems  associated  with  dimensionality  due  to  its  dimension  reduc¬ 
tion  capability.  This  process  will  be  more  fully  described  in  the  next  section. 

Implementation  of  the  PCA  method  requires  defining  the  “Total-Scatter”  matrix, 
ST  (i.e.,  the  data  covariance  matrix)  via  the  following  equation: 
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(2.7) 


T 

ST=ZJk=l(Xk-/J)(Xk-V)  > 

where  xk  is  an  image  reshaped  as  a  column  vector  and  //  is  the  mean  of  all  training  im¬ 
ages. 


The  projection  vectors  matrix,  W  t,  is  chosen  in  such  a  way  that  it  maximizes  the 
determinant  of  the  total  scatter  matrix  of  the  projected  training  data  matrix  according  to: 

Wopt  =arg(maxr  \WT STW\)  =  [wx,w2,...wml  (2.8) 

where  wl,w2,...,wm  constitute  the  set  of  TV-dimensional  projection  eigenvectors  (sorted 
by  descending  eigenvalue  magnitude  that  are  obtained  from  ST ). 

The  PC  A  is  implemented  algorithmically  using  the  following  steps: 

•  Extract  feature  vectors  from  data  images,  which  are  obtained  from  the  sub¬ 
jects  and  reshaped  into  a  column-vector  format. 

•  Organize  the  training  data  into  a  matrix  form  (training-data-matrix)  by 
appending  the  column  vectors,  which  represent  the  various  training  im¬ 
ages,  such  that  images  of  the  same  class  are  appended  together.  Conse¬ 
quently,  all  operations  take  place  in  a  column-wise  fashion,  in  the  various 
steps  of  the  algorithm. 

•  Calculate  the  covariance  matrix  of  the  training  data  matrix. 

•  Calculate  the  eigendecomposition  of  the  covariance  matrix. 

•  Sort  the  eigenvalues  in  decreasing  order  of  their  magnitude,  along  with 
their  associated  eigenvectors.  Keep  all  eigenvectors  that  are  associated 
with  a  particular  eigenvalue  magnitude  above  a  user-defined  threshold. 

•  Project  the  training  data  matrix  onto  the  eigenvector  subspace. 

•  Calculate  the  class  centroids  from  the  projected  training  data.  The  class 
centroids  are  used  to  characterize  each  specific  class. 

•  Project  the  testing  data  onto  the  eigenvectors  subspace. 

•  Calculate  distances  between  each  projected  testing  image  to  all  class  cen¬ 
troids. 

•  Assign  each  testing  image  to  a  specific  class  by  selecting  the  class,  which 
corresponds  to  the  smallest  distance  between  the  projected  testing  and  all 
previously-computed  training  class  centroids. 
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C.  FISHERFACE  APPROACH 

1.  Introduction 

The  Linear  Discriminant  Analysis  (LDA)  algorithm,  also  known  as  Fisher  Linear 
Discriminant  (FLD)  analysis,  is  based  on  a  linear  projection  of  the  data  onto  a  lower  di¬ 
mensional  feature  space,  which  identifies  the  projection  that  best  discriminates  among 
classes,  rather  than  those  directions  that  best  represent  the  data  [Martinez,  2001],  As  a 
result,  the  LDA  approach  is  less  affected  than  PCA  by  variations  in  lighting  and  face  ex¬ 
pression  [Belhumeur,  1997]. 

2.  Algorithm  Description 

The  LDA  approach  uses  the  concept  of  intra-class  (or  “Within-Class”)  and  inter¬ 
class  (or  “Between-Class”)  scatter  matrices  [Yambor,  2000].  The  Within-Class-Scatter 
matrix  (WCS)  SI  for  class  i  is  defined  as: 

=  (2.9) 

xeX, 

where  mj  is  the  mean  image  of  class  i.  The  overall  WCS  matrix,  Sw,  is  defined  as  fol¬ 
lows: 

(2.10) 

1=1 

where  C  is  the  number  of  classes. 

In  a  similar  fashion,  the  Between-Class-Scatter  matrix  (BCS)  SB  is  defined  as: 

(2-1 1) 

i= 1 

where  m  is  the  mean  of  all  the  training  images  and  »  is  the  number  of  images  in  the  ith 
class. 

The  LDA  projection  direction  matrix,  W  t  is  defined  as  the  matrix  which  maxi¬ 
mizes  the  ratio  of  the  determinant  of  the  Between-Class-Scatter  SB  to  the  determinant  of 
the  Within-Class-Scatter  Sw ,  both  defined  on  the  training  data  [Belhumeur,  1997].  This 
is  denoted  by  the  equation,  known  as  the  “Rayleigh  quotient”: 
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f 


(2.12) 


KPt  =  ar8 


|^max^ 


WTSBW  | 
WTSwW\, 


Appendix  A.  1  shows  that  finding  the  solution  for  a  particular  Rayleigh  quotient  is 
equivalent  to  solving  the  following  generalized  eigenvalue-eigenvector  problem,  (pro¬ 
vided  that  the  WCS  matrix  Sw  is  non-singular): 

SBwi=X,Sv>w„i  =  (2.13) 


where  A.  is  the  ith  generalized  eigenvalue,  wt  is  the  ith  generalized  eigenvector  (obtained  by 
solving  the  above  generalized  eigenproblem),  and  W  t  =  [wv  w2,...,  wk]  is  the  eigenvector 

matrix  solution.  Recall  that  there  is  a  maximum  potential  occurrence  of  C  - 1  nonzero 
generalized  eigenvalues,  as  reviewed  in  Appendix  A.2  [Belhumeur,  1997]).  TheseC-1 
nonzero  eigenvectors  constitute  the  Fisher  basis  vectors,  and  are  typically  sorted  by  order 
of  decreasing  eigenvalues. 

Note  that  the  A-dimensional  WCS  matrix  Sw  has  a  maximum  rank  equal  to 
P-C,  where  P  is  the  number  of  images  in  the  training  set,  and  C  is  the  number  of 
classes,  (reviewed  in  Appendix  A.3).  In  addition,  the  A-dimensional  BCS  matrix  SB  has 
a  maximum  rank  equal  to  C  - 1,  as  this  matrix  is  the  combination  of  C  matrices  of  rank  1 . 
Note  also  that  the  number  of  images,  P,  is  usually  smaller  than  the  number  of  pixels  per 
image  N,  in  face  recognition  applications  which  result  in  singular  scatter  matrices.  When 
all  of  the  above  is  taken  into  account,  the  Rayleigh  quotient  can  no  longer  be  solved  di¬ 
rectly  [Belheumer,  1997],  A  possible  solution  to  the  matrix  Sw  singularity  problem  is  to 

reduce  the  data  dimension,  in  order  to  insure  that  only  non-singular  scatter  matrices  are 
produced  prior  to  applying  the  LDA  scheme,  [Belhumeur,  1997],  The  resulting  scheme  is 
called  the  “Fisherface  approach.” 

Thus,  in  the  Fisherface  approach,  original  images  are  initially  projected  using  the 
PC  A  approach  in  order  to  reduce  the  dimension  of  the  projected  data  to  P-C .  Then,  the 
LDA  approach  is  applied  to  further  reduce  the  data,  to  dimension  C-l.  Next,  class  cen¬ 
troids  are  computed  using  the  training  data,  as  it  was  done  for  the  PCA  approach.  Finally, 
testing  images  are  projected  onto  the  smaller  dimensional  space  using  the  projection  ma- 
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trices  defined  in  PCA  and  LDA  steps.  Final  class  decision  is  obtained  by  selecting  for 
each  testing  image  the  class,  which  is  closest  in  norm  to  the  training  data  class  centroids. 
D.  PCA  AND  LDA  COMPARISONS 

Typical  implementations  of  PCA  and  LDA  approaches  can  be  compared  by  ap¬ 
plying  them  to  two  2-dimensional  data  classes  (each  one  with  five  samples),  as  shown  in 
Fig.  2.1,  where  the  two  classes  are  represented  by  stars  and  circles,  respectively.  Figure 
2.1  also  shows  their  1 -dimensional  projections,  as  defined  by  the  PCA  (solid  line)  and 
the  LDA  (dashed  line)  approaches.  This  figure  illustrates  that  the  PCA  approach  causes 
projected  classes  to  be  interlaced,  while  the  LDA  approach  best  preserves  discrimination 
between  the  projected  classes.  The  respective  Matlab  code  that  performs  the  subject  PCA 
LDA  projection  comparison  is  provided  in  Appendix  B. 


PCA  vs  LDA  Projections 


Figure  2.1.  PCA  and  LDA  projection  directions  for  the  two  2  classes,  “o”  and 
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E.  CONCLUSIONS 

This  chapter  discussed  two  basic  linear  classifiers,  PCA  and  LDA.  The  chapter 
also  illustrated  the  notion  that  LDA  is  better  suited  to  classification  problems  than  is 
PCA.  Nevertheless,  it  should  be  recognized  that  either  algorithm  is  successful  for  data  for 
which  linear  projections  are  well  matched  to  the  data  structure.  Extension  of  the  ap¬ 
proaches  to  nonlinear  classifiers  is  considered  in  the  next  chapter. 
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III.  NONLINEAR  CLASSIFICATION  ALGORITHMS 


This  chapter  extends  the  linear  classifiers  presented  earlier  to  nonlinear  schemes. 
First,  we  present  the  basic  idea  behind  nonlinear  classification  approaches  and  introduce 
the  concept  of  kernels,  which  is  the  key  idea  behind  nonlinear  schemes.  Next,  we  de¬ 
scribe  the  “kernel  trick”  which  makes  such  procedures  computationally  feasible.  Finally, 
we  present  the  specific  application  of  kernel-based  implementations  to  the  linear  dis¬ 
criminant  analysis  called  the  Generalized  Discriminant  Analysis  (GDA). 

A.  THEORETICAL  BACKGROUND 

Even  though  LDA  is  designed  for  discrimination  applications,  it  is  based  on  a  lin¬ 
ear  projection  approach.  This  may  result  in  performance  degradations  when  the  data  be¬ 
havior  is  not  well-suited  to  such  design  constraints,  especially  when  classes  are  not  line¬ 
arly  separable.  A  significant  amount  of  research  has  been  conducted  recently  in  the  area 
of  kernel-based  schemes,  much  of  which  has  led  to  the  implementation  of  nonlinear  pro¬ 
jections.  The  kernel  theory  was  first  used  in  pattern  recognition  applications  by  Aizer¬ 
man,  [Aizerman,  1964;  Baudat,  2003].  The  main  kernel-based  learning  methods  are  the 
“Support  Vector  Machines”  (SVM),  the  “Kernel  Principal  Component  Analysis” 

(KPCA),  and  the  “Kernel  Fischer  Discriminant  Analysis”  (KFD),  also  known  as  “Gener¬ 
alized  Discriminant  Analysis”  (GDA),  [Muller,  2001].  All  of  these  methods  have  been 
widely  applied  in  classification,  regression,  pattern  and  object  recognition,  optical  charac¬ 
ter  recognition  (OCR),  text  categorization,  time  series  prediction,  gene  expression  profile 
analysis,  protein  and  DNA  analysis  and  other  applications  [Muller,  2001;  Roobaert,  1999; 
Joachims,  1998;  Brown,  2000]. 

Specifically,  the  following  results  have  been  recently  reported  when  applying 
kernel-based  schemes  in: 

•  Face  recognition:  Various  nonlinear  discrimination  algorithms  have  been 
developed  [Baudat,  2000,  2003;  Socolinsky,  2003;  Liu,  2004]  by  taking 
into  account  nonlinear  features  of  the  training  data,  thereby  extending 
classification  to  data  which  cannot  be  solved  using  linear  algorithms.  For 
example,  the  GDA  method  has  been  applied  successfully  with  various 
kernel  functions  [Liu,  April  2004;  Liu,  May  2004;  Hearst,  1998], 
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•  3-Dimensional  object  recognition:  the  SVM  algorithm  has  been  succes¬ 
sively  applied  to  recognize  3-dimensional  objects  when  a  limited  number 
of  training  views  are  available  [Roobaert,  1999]. 

•  Text  categorization:  The  objective  of  this  application  is  to  assign  docu¬ 
ments  into  a  set  of  defined  document  types,  such  as  “news”,  “historical”, 
“business”,  “scientific”,  “medical”,  etc.  [Joachims,  1998],  SVM  ap¬ 
proaches  have  been  applied  successfully  to  organize  network  data,  docu¬ 
ment  databases,  or  even  to  learn  a  user’s  web  browsing  preferences 
[Joachims,  1998]. 

•  Optical  Character  Recognition  (OCR):  The  goal  of  this  application  is  to 
identify  handwritten  samples.  The  first  real  world  experiments  were  ap¬ 
plied  to  OCR  data  using  a  basic  SVM  implementation  and  were  shown  to 
perform  well  (with  error  rates  of  only  around  0.7%)  [Muller,  2001], 

•  Gene  expression  profile  and  DNA  and  protein  analysis:  SVM  approaches 
have  also  been  applied  to  classify  genetic  functionality.  Such  applications 
require  the  extraction  of  data  from  the  DNA  strand  relating  to  how  a  gene 
expresses  itself  [Brown,  2000;  Muller,  2001]. 

1.  Introduction  to  Kernel  Theory 

The  basic  idea  behind  Kernel  methods  is  to  map  the  data,  xeX,  into  a  large  di¬ 
mensional  feature  space,  F,  via  a  function,  <p,  [Baudat,  2003],  where  the  mapped  data 
have  characteristics  which  can  be  manipulated  with  simple  computational  methods.  This 
mapping  may  seem  counter-intuitive  due  to  the  issues  raised  by  the  “ curse  of  dimension¬ 
ality”  [Muller,  2001].  Mapping  essentially  shows  that  the  number  of  samples  required  to 
set-up  an  estimation  problem  increases  exponentially,  with  the  dimension  of  the  sample 
space.  Nevertheless,  statistical  learning  theory  also  states  that  learning  in  the  high  dimen¬ 
sional  feature  space,  F,  can  be  simple  if  low-complexity  criteria  and  specific  mapping 
functions  are  selected.  Therefore,  the  key  issue  when  dealing  with  kernel-based  schemes 
lies  in  the  selection  of  the  specific  mapping  <p .  Rather  than  being  carried  out  explicitly, 
this  calculation  is  reformulated  in  terms  of  dot  products  applied  to  the  data.  Thus,  data  in 
the  transformed  space  are  expressed  in  terms  of  a  kernel  matrix  containing  comparisons 
of  data  pairs  only,  which  makes  the  implementation  of  this  process  less  expensive. 

2.  Introduction  to  the  “Kernel  Trick” 

Assume  that  there  are  m  training  data  [Scholkopf,  2000]: 

X  =  {xl,x2,...,xm}.  (3.1) 
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Also,  assume  that  the  training  patterns,  x;,  belong  to  one  of  the  two  classes  with 
labels,  y. ,  defined  as: 

yt  e  {-1,+1},  V/  =  1,2,..., w.  (3.2) 

Therefore,  the  training  data  with  label  assignment  can  be  expressed  as: 

Oi , TlX  02 , y2), ...,  (xm ,ym )  e  X: x  {-1, +1} .  (3.3) 

Recall  that  the  goal  of  learning  applications  is  to  assign  a  label  (i.e.,  class)  to  an  unlabeled 
data  pattern.  Therefore,  its  goal  can  be  viewed  as  predicting  the  value  of  labels,  y,  for  new 
unlabeled  patterns,  xel,  in  such  a  way  that  the  ordered  pair,  (x,  y),  is  as  similar  as  pos¬ 
sible  to  ordered  pairs  defined  in  the  training  data.  Such  a  procedure  requires  the  definition 
of  a  similarity  measure  in  the  transformed  space.  As  a  result,  a  similarity  measure  be¬ 
tween  two  data  points,  x,  andx2 ,  is  implemented  as: 

k:XxX^R, 

(3.4) 

(xl,x2)->p(xl)-<p(x2)  =  k(xl,x2), 

where  (p  represents  the  mapping  function  and  can  be  rewritten  in  terms  of  a  kernel  func¬ 
tion,  k,  for  specific  choices  of  mapping,  then  implemented  in  terms  of  a  dot  product  only. 
This  process  is  known  as  the  “kernel  trick”  [Scholkopf,  2000]. 

In  order  to  be  able  to  use  dot  products  as  similarity  measurements,  input  patterns 
(i.e.,  vectors)  should  lie  in  a  dot  product  domain.  This  means  that  the  arguments  of  the 
kernel  functions  should  be  dot  products  of  the  original  input  element  vectors.  Thus,  the 
new  mapped  space,  F,  includes  the  kernel  function  values,  which  have  dot  products  as 
inputs.  Generally,  the  space  F  is  different  from  the  original  N-dimensional  space. 

Next,  we  present  a  basic  example  to  illustrate  the  “kernel  trick”  [Muller,  2001]. 

Assume  that  there  are  two  2-dimensional  classes,  “x”  and  “o”,  as  shown  on  the 
left-hand  plot  of  Fig.  3.1. 

Figure  3.1  shows  that  the  original  two  classes  are  nonlinearly  separable.  Conse¬ 
quently,  a  nonlinear  mapping  <p  is  required  to  potentially  transform  the  input  data  into  a 
higher  dimensional  space  in  order  to  enable  the  transformed  data  classes  to  be  linearly 
separable.  In  this  specific  case,  the  mapping  (p  will  be: 
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<p:R2  — »i?3. 


(3.5) 


Figure  3.1.  Illustration  of  a  kernel-based,  nonlinear  transformation  for  a  two-class 
problem.  Left  plot:  original,  2-dimensional  input  space;  Right  plot:  transformed  3- 
dimensional  feature  space.  (From  [Muller,  2001].). 

A  potentially  nonlinear  mapping  that  transforms  the  2-dimensional  element  vectors  into 
3-dimensional  element  vectors  can  be  expressed  as: 

(xl,x2)^(z1,z2,z3),  (3.6) 

where  the  transformed  space  features  are  defined  as  the  second  order  monomial  terms: 

(Zj ,  z2 ,  z3 )  □  (xf ,  yflxlx2  ,x2).  (3.7) 

Note  that  the  V2  present  in  the  above  term,  x,x2,  is  added  for  normalization  purposes  so 
that  the  nonlinear  mapping  can  lead  to  a  valid  kernel  expression,  as  shown  below. 

Next,  let  us  define  the  following  polynomial  kernel  function  as: 

k{x,y)U(x-y)\  (3.8) 

where  x  and  y  are  2-dimensional  element  vectors  (i.e.,  x  =  (Xj ,  x2 ),  and  y  =  (yx ,  y2 ) ). 

The  second  order  kernel  function  is  defined  for  d=2  as: 

k(x,y)  =  (x-y)2.  (3.9) 

For  2-dimensional  vectors  x  and  y,  the  kernel  expression  k(x,y)  may  be  rewritten  as: 
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(3.10) 


k{x,y)  =  {x-yf 

=  ((xl,x2)(yl,y2)T)2 

=  (*iTi  +x2y2f 
—  (X|  v’|  +  2xlylx2y2  +  x2y2 ) 

(Xj ,  ^x2 ,  x2 )( j-’i  ,  Ylyly2 ,  y2 ) 

=  {(p(x)-cp{y)). 

The  above  equations  show  that  the  nonlinear  mapping  (p  defined  in  Eq.  (3.7)  is 
performed  implicitly  via  the  kernel  expression,  k(x,y ) ,  by  the  use  of  dot  products  in  the 
input  domain. 

It  can  be  shown  that  such  nonlinear  mapping  potentially  offers  the  following  ad¬ 
vantages  [Scholkopf,  2000]: 

•  Similarity  measures  (dot  products)  in  the  transformed  space,  F,  can  be  de¬ 
fined  as: 

k(y  ,x2)  =  (<p(X\ )  •  <p(x2 )).  (3.11) 

•  Linear  algebra  and  analytic  geometry  concepts  may  be  applied  in  the 
transformed  space,  facilitating  data  manipulation. 

•  Flexibility  in  selecting  the  mapping  function,  <p,  allowing  for  the  develop¬ 
ment  of  specialized  algorithms. 

3.  Support  Vector  Machines  (SVM) 

SVM  is  a  type  of  pattern  classification  and  regression  method  that  uses  the  con¬ 
cept  of  kernel  functions,  in  conjunction  with  dot  products  [Scholkopf,  2000]. 

The  basic  idea  behind  SVM  is  to  identify  a  hyperplane  in  a  transformed  high  di¬ 
mensional  space  that  linearly  separates  the  mapped  classes  in  that  space.  This  identifica¬ 
tion  procedure  is  accomplished  via  special  kernel  functions  that  are  applied  to  the  input 
element  vectors. 

In  order  to  clarify  the  concept  at  the  basis  of  the  SVM  approach,  consider  the 

simple  case  of  the  classification  of  two  nonlinearly  separable  classes.  For  this  particular 

case  SVM  is  used  to  select  the  optimum  mapping  cp  of  the  input  space  into  a  space  in 

which  a  class  separating  hyperplane  can  be  defined.  This  type  of  hyperplane  is  illustrated 

in  Fig.  3.2  and  lies  approximately  in  the  middle  of  the  shortest  distance  of  the  two 

classes’  convex  hulls.  The  support  vectors  define  the  exact  location  of  the  hyperplane.  In 
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Fig.  3.2,  one  can  observe  the  support  vectors  in  this  newly  transformed  space,  F;  these 
are  a  product  of  the  elements  of  the  two  classes,  the  separating  hyperplane,  and  the  clos¬ 
est  element  vectors  of  each  class  to  the  hyperplane. 

The  key  point  behind  the  SVM  approach  is  that  the  kernel  function  of  the  element 
vectors’  dot  products  may  be  used  directly,  i.e.,  without  the  need  to  carry  out  the  mapping 
of  the  inputs  to  the  new  high  dimensional  space  explicitly. 


Figure  3.2.  Basic  idea  of  Support  Vector  Machines:  Identification  of  the  two-class 
optimum  separating  hyperplane  in  the  new,  high-dimensional  transformed  space.  (After 

[Scholkopf,  2000].). 


The  hyperplane  in  the  high-dimensional  transformed  space  is  actually  a  nonlinear 
decision  boundary  in  the  input  space.  The  aforementioned  decision  boundary,  when 
viewed  in  the  original  input  space,  is  shown  as  the  nonlinear  curve  in  Fig.  3.3,  with  the 
support  vectors  (marked  as  double  circles)  on  the  two  curved  lines  above  and  below  the 
nonlinear  decision  boundary.  In  the  subject  figure,  one  class  consists  of  circle  elements 
and  the  other  of  discus-shaped  elements. 
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Figure  3.3.  Original  2-dimensional  two-class  problem  (classes,  “o”  and  show¬ 
ing  the  support  vectors  and  the  nonlinear  optimal  boundary.  (From  [Scholkopf,  2000].). 

The  main  advantages  provided  by  the  S VM  method  are  [Scholkopf,  2000] : 

•  The  SYM  method  can  identify  and  eliminate  data  outliers. 

•  SVM  schemes  offer  flexibility  via  the  specific  selection  of  the  kernel  func¬ 
tion.  They  also  allow  for  flexibility  to  choose  the  most  suitable  similarity 
function,  and  can  handle  large  feature  spaces. 

The  main  drawbacks  of  the  SVM  method  are  [Scholkopf,  2000]: 

•  The  size  of  the  quadratic  programming  problem  that  results  from  the  SYM 
algorithm  can  be  quite  large  [Scholkopf,  2000],  resulting  in  high  computa¬ 
tional  load  and  the  potential  of  numeric  problems. 

•  The  potential  of  high  noise  levels  in  the  classes’  element  vectors  increases 
the  SVM  implementation  complexity  as  additional  conditions  have  to  be 
met  to  minimize  noise  impacts  on  the  classification  accuracy,  resulting  in 
an  increased  computational  load. 

We  note  that  optimization  algorithms  addressing  the  above  issues  have  been  re¬ 
cently  proposed  in  the  literature  [Scholkopf,  2000]. 
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4.  Kernel  PCA 

The  Kernel  PCA  executes  a  nonlinear  mapping  of  the  input  space  to  a  new  high 
dimensional  space,  using  the  “kernel  trick”.  Next,  the  classic  linear  PCA  is  applied  in  the 
transformed  space,  resulting  in  a  linear  eigenvalue-eigenvector  decomposition  problem 
of  a  matrix,  whose  data  elements  are  the  values  of  the  kernel  function  [Hearst,  1998]. 
Numerous  results  have  been  reported  with  this  scheme  which  shows  superior  perform¬ 
ance  over  the  basic  PCA  implementation,  [Scholkopf,  1998;  Kwang,  2002;  Lu,  2003], 

B.  GENERALIZED  DISCRIMINANT  ANALYSIS  (GDA) 

1.  Introduction 

Kernel  concepts  may  be  applied  to  classical  algorithms,  such  as  the  PCA  or  LDA. 
Applying  kernel  theory  to  LDA  means  that  the  linear  LDA  algorithm  is  applied  to  a 
nonlinear  transformed  data,  resulting  in  a  nonlinear  algorithm  without  the  drawbacks  pre¬ 
sent  in  a  direct  nonlinear  transformation  of  the  algorithm.  Therefore,  the  GDA  solves  an 
ordinary  eigenvalue-eigenvector  decomposition,  albeit  of  the  transformed  data  instead. 

Recall  that  the  GDA  is  a  supervised  scheme.  Therefore,  issues  dealing  with  algo¬ 
rithm  generalization  are  still  an  issue,  which  in  this  case  depends  on  the  geometric  char¬ 
acteristics  of  the  training  data,  and  not  on  its  dimensionality.  As  a  result,  a  judicious  se¬ 
lection  of  the  mapping  function  (p  is  a  significant  factor  in  insuring  the  reliability  of  the 
results  and  a  lower  computational  load. 

2.  Algorithm  Description 

This  section  describes  the  GDA  derivation  and  closely  follows  the  presentation 
given  by  [Baudat,  2000],  Assume  that  x  is  a  column  vector  representing  a  reshaped  image 
of  the  data  matrix  A  containing  the  M  images  included  in  the  training  dataset.  Further, 
assume  that  A  represents  the  matrix  containing  the  nt  images  of  class  i,  where 
i  =  1,...,  N,  and  N  is  the  total  number  of  classes  under  study.  Then,  the  total  number  of 
images  contained  in  the  training  dataset  is  given  by: 

M  =  fjni,  (3.12) 

i= 1 

and  the  data  covariance  matrix  C  is  defined  as: 
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(3.13) 


1  M 

c=— y 

A/f 


assuming  that  the  training  data  is  centered  (i.e.,  has  mean  equal  to  0). 


Next,  the  input  space  X  must  be  mapped  into  a  Hilbert  space,  F,  by  using  a 
nonlinear  mapping  function  (p : 


$ v:X->F . 
x  — »  (p{x). 


(3.14) 


Then,  the  covariance  matrix  in  the  resultant  feature  space,  F,  is  given  by: 

1  M 

v=—  (3-15) 

assuming  that  the  data  is  centered  in  the  transformed  space. 

Next,  the  covariance  matrix  B  that  is  derived  from  the  class  centers  expressed  in 
the  transformed  space,  F,  is  defined  as: 

M  tt 


where  (pt  represents  the  mean  value  of  class  i,  and  is  defined  as: 

«=-!>(*«>>  (3-17) 

n,  k= i 

where  xik  is  element  k  of  the  class  i. 

The  data  covariance  matrix  may  be  expressed  in  the  transformed  space,  F,  as  fol¬ 
lows: 

v  =  4 (3.18) 

Note  that  the  data  covariance  matrix  V defined  above  in  Eq.  (3.18)  is  expressed  in 
terms  of  dot  products  only.  The  main  advantage  behind  kernel-based  schemes  is  that 
nonlinear  algorithms  may  be  implemented  in  terms  of  expressions  involving  only  the  dot 
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product  on  the  data  considered,  i.e.,  without  having  to  explicitly  define  and  use  the  actual 
nonlinear  transform  expression  (p{x).  The  dot  product  expression  obtained  for  data  xi  and 

Xj  is  defined  by  the  following  kernel  function: 

ky  =  k{xt  ,Xj)  =  (pT  (x,  )<p(x/).  (3.19) 

When  dealing  with  different  data  classes,  the  kernel  expression  becomes: 

(ky)P9=<pr(xpM\)>  (3-20) 

where  p  and  q  are  data  classes. 

Next,  define  the  (MxM)  dimensional  matrix  K  as: 

K  =  (Kpq)p__LN,  (3.21) 

q=\..N 

where  Kpq  are  (npx  nq  j  dimensional  sub-matrices  defined  as: 

Kpq=(k,j),  (3-22) 

with  i  =  \,...,np,  j  =  p  =  \,...,N,  and  q  =  l,...,N. 

Next,  define  the  M-dimensional  block  diagonal  matrix  W,  where  each  block  If  is 
of  dimension  (ni  x  « . ) ,  and  given  by: 

i  ...  n 

(3.23) 

i  -  ij 

where  ni  represents  the  number  of  the  data  samples  of  class  i. 

Recall  that  LDA  maps  the  input  space  into  one  where  the  principal  components 
maximize  the  ratio  of  the  between-class-scatter  matrix  (also  known  as  the  inter-class 
inertia)  and  the  within-class-scatter  matrix  (also  known  as  the  intra-class  inertia).  The 
use  of  kernel  functions  expands  the  LDA  application  to  include  the  nonlinear  transformed 
space.  The  principal  components  in  the  resultant  space  are  nonlinearly  related  to  the  input 
variables  and  the  kernel  function,  k,  contributes  to  the  creation  of  the  function  that 
nonlinearly  separates  the  classes.  Following  the  findings  used  in  the  LDA  derivation,  the 
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v  Bv 

GDA  maximizes  the  Rayleigh  quotient  — —  of  the  between-class  scatter  matrix  over 

v  Vv 


the  within-class  scatter  matrix,  where  these  V  and  B  matrices  (defined  previously  in  Eqs. 
(3.15)  and  (3.16))  are  defined  in  the  transformed  space.  Therefore,  the  LDA  maximiza¬ 
tion  problem  is  equivalent  to  solving  the  following  eigenvalue-eigenvector  decomposi¬ 
tion  problem: 

AVv  =  Bv,  (3.24) 

where  A  and  v  are  the  resulting  eigenvalues  and  eigenvectors,  respectively. 

The  eigenvectors  v  can  be  expressed  as  a  linear  combination  of  elements  ex¬ 
pressed  in  the  transformed  space  as: 

v =TJiapMx,’^  (3-25) 

P= 1 


where  ann,„_,  v,  .  „  ,  are  defined  as  the  coefficients  of  the  linear  combination.  Note  that 
the  coefficient  vector  a  =  (a  pq  )/)=l  N.q=l  n  can  be  expressed  in  a  more  compact  form  as 
a  =  (ap)p=\  n  ’  where  ap  =  (apq)q=l  n  is  defined  as  the  coefficient  of  the  vector  v  in  class 
P- 


Baudat  showed  that  the  Rayleigh  quotient  expressed  in  the  transformed  space, 
may  be  rewritten  as  [Baudat,  2000]: 


vtBv  _  aT KWKa 

Vvv  ~  ~7KKa 


(3.26) 


Therefore,  maximizing  the  above  Rayleigh  quotient  can  be  executed  by  solving  the  fol¬ 
lowing  generalized  eigenproblem: 

KWKa  =  AKKa.  (3.27) 

Note  that  the  matrix  K  is  symmetric  and  positive  definite  when  the  kernel  type  se¬ 
lected  satisfies  Mercer  theorem  [Scholkpof,  1998].  Therefore,  K  may  be  expressed  as: 

K  =  PYPt,  (3.28) 
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where  P  and  r  are  the  eigenvector  and  nonsingular  diagonal  eigenvalue  matrices,  respec¬ 
tively.  In  addition,  P  has  full  rank  when  K  is  positive  definite  and  the  eigenvector  matrix 
P  is  unitary,  i.e.,  PPT  =  PTP  =  I. 

At  this  point,  replacing  the  matrix  A  by  its  eigendecomposition  in  Eq.  (3.27)  leads  to: 

(PTPt)W(PTPt)cc  =  A(PTPT)(PTPT)a.  (3.29) 

Recall  that  the  eigenvector  matrix  P  is  unitary.  Thus,  Eq.  (3.29)  becomes: 

(PT PT  )W  (PT PT  )a  =  X(PT)(TPT)a.  (3.30) 

Using  the  fact  that  the  eigenvalue  matrix  P  and  T  have  full  rank  (i.e.,  are  invertible  matri¬ 
ces)  when  the  kernel  type  is  selected  to  satisfy  Mercer  theorem,  Eq.  (3.30)  may  be  simpli¬ 
fied  to: 

{PT)W{PYPT)a  =  X{YPT)a.  (3.31) 

In  order  to  simplify  the  following  calculations,  a  new  variable  (i  is  defined  as: 

P  =  TPt  a.  (3.32) 

Then,  replacing  Eq.  (3.32)  into  Eq.  (3.31)  leads  to: 

(PT)WPp  =  hp.  (3.33) 

Therefore,  maximizing  the  Rayleigh  quotient  shown  in  Eq.  (3.26)  above  may  be 
obtained  by  first,  solving  the  eigenproblem  derived  in  Eq.  (3.33)  in  order  to  obtain  ft ;  and 

second,  by  computing  the  coefficient  vector  a  from  /?,  as  a  =  PY~X  (3 .  Note  that  the  ac¬ 
tual  a  coefficients  are  not  unique,  as  only  the  direction  of  the  eigenvectors  obtained  by 
solving  an  eigenproblem  is  unique.  The  a  coefficients  can  be  computed  by  setting  the 
vectors  v  to  norm  1.  Thus,  taking  into  consideration  Eq.  (3.26),  the  following  expression 
is  obtained: 

vrv  =  l  (3.34) 

=> vTv = Y^Y^aP<,ay^Mxih) = L  (3-35) 

p=\  <7=1  l-l  h= 1 
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Therefore, 


N  N 


T 

V  V 


p= i  ;=i 


(3.36) 


which  yields: 

vTv  =  aTKa  =  1,  (3.37) 

where  the  dimension  of  Kpl  is  equal  to  ( np  x  j ,  as  the  dimensions  of  ap  and  at  are 

[np'x  l)  and  ( x  l), respectively.  Thus,  the  coefficients  a  need  to  be  divided  by  \jarKa, 

in  order  to  normalize  the  vectors  v  to  norm  1 . 

Finally,  testing  data  z  may  be  projected  into  the  nonlinear  feature  space  by  pro¬ 
jecting  them  as: 


v>(z)=xix/(xw’z)-  (3-38) 

P= 1  9=1 

3.  Kernel  Selection 

Recall  that  the  basic  idea  of  the  main  kernel-based  schemes  is  to  project  the  data 
into  a  feature  space  via  a  nonlinear  projection  (p  and  to  apply  linear  schemes  in  that  new 
space.  The  kernel  function  required  for  this  implementation  is  expressed  in  terms  of  dot 
products  as: 

k{x,y)  =  (p{x)-(p(y).  (3.39) 


Currently,  several  types  of  kernels  are  being  used  in  kernel-based  implementa¬ 
tions.  The  three  most  significant  are: 

•  The  polynomial  kernel  of  the  form:  k(x,  y)  =  (x  •  y  )d ,  where  d  >  1  has 

been  used  extensively.  Note  that  the  GDA  implementation  reverts  back  to 
the  classical  LDA  implementation  when  d  =  1 .  In  addition,  [Liu,  2004]  re¬ 
cently  investigated  polynomial  kernel  functions  with  fractional  powers 
(i.e.,  when  0  <  d  <  1 ),  which  can  be  applied  to  face  recognition  applica¬ 
tions,  even  though  such  studies  may  not  necessarily  satisfy  the  Mercer 
theorem,  and  may  not  be  considered  as  valid  kernel  functions.  Neverthe¬ 
less,  Liu  showed  in  his  simulations,  (when  using  30  features  or  more  on 
visible  imaging  data),  that  fractional  degree  polynomial  kernels  lead  to 
higher  recognition  rates  than  do  those  obtained  with  integer-degree  poly¬ 
nomial  kernels  [Liu,  2004], 
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•  The  Gaussian  kernel  function  defined  as: 

k(x,y)  =  e  ,  (3.40) 

where  the  parameter,  cr ,  has  to  be  selected  arbitrarily. 

•  The  sigmoid  kernel  defined  as: 

k(x,  y )  =  tanh(A:(xy)  +  5),  (3.41) 

where  0  <  k  and  3  <  0.  The  sigmoid  kernel  function  has  been  successfully  used  in  prac¬ 
tice,  (especially  in  SVM  applications),  even  though  it  does  not  actually  define  a  positive 
semi-definite  Gram  matrix  and  thus,  is  not  a  valid  kernel  function  [Liu,  2004], 

4.  Application  to  Iris  Data 

The  GDA  scheme  is  first  applied  to  a  classical  dataset  commonly  used  in  classifi¬ 
cation  applications  for  benchmarking  purposes,  namely  the  “Iris”  dataset  [Baudat,  2000]. 
The  specific  Iris  data  consists  of  four  real  parameters  of  the  flower  Iris  divided  into  three 
classes,  each  with  50  elements  of  dimension  four.  The  Iris  data  consists  of:  the  sepal 
length,  the  sepal  width,  the  petal  length,  and  the  petal  width,  with  all  measurements  ex¬ 
pressed  in  millimeters.  The  three  classes  represent  three  different  types  of  Iris  flowers: 

Iris  setosa,  Iris  versicolor,  and  Iris  virginica.  The  Iris  dataset  is  interesting,  as  one  class  is 
linearly  separable  from  the  other  two,  whereas  the  other  two  are  not  linearly  separable 
from  each  other.  As  a  result,  linear  classifiers  are  unable  to  separate  the  three  classes  very 
well.  Next,  we  will  show  that  the  GDA  is  able  to  separate  all  three  classes  well. 

Using  the  regular  LDA  leads  to  two  overlapping  projected  classes,  with  only  one 
being  separable  from  the  other  two.  Such  overlapping  is  shown  in  Figs.  3.4  and  3.5, 
which  plot  the  Iris  data  1-and  2-dimensional  projections  in  the  LDA-derived  directions 
[Baudat,  2000]. 
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1-D  Projected  Training  Iris  Data  using  LDA 


Figure  3.4.  1-Dimensional  LDA  projection  of  Iris  data.  Classes  2  and  3  are  not  sepa¬ 
rable. 
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2-D  Projected  Training  Iris  Data  using  LDA 


Figure  3.5.  2-Dimensional  LDA  projection  of  Iris  data.  Classes  2  and  3  are  not  sepa¬ 
rable. 

Next,  Figs.  3.6,  3.7,  and  3.8  plot  the  2-dimensional  class  projections  obtained  us¬ 
ing  the  GDA  with  a  Gaussian  kernel,  [Baudat,  2000],  where  different  values  of  the  arbi¬ 
trary  spread  parameter  o  were  selected  (cr  =  0.3, 0.7, 7 ) .  The  plots  were  produced  using 

the  software  available  from  [Baudat,  October  2000].  Results  demonstrate  that  all  three 
projected  classes  are  widely  separated  from  the  other  two.  It  should  be  noted  that  the  pro¬ 
jected  data  spread  increases  with  increasing  values  of  the  spread  parameter,  which  may 
result  in  nonlinearly  separable  projected  data,  as  shown  in  Fig.  3.8. 
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2-D  Projected  Training  Iris  Data  using  GDA-  Gaussian  Kernel-variance=0.3 


Figure  3.6.  GDA  projection  of  Iris  data.  All  classes  are  separable,  but  the  element  vec¬ 

tors  within  the  classes  are  not  discernible  (cr  =  0.3^  Small  variation.). 
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2-D  Projected  Training  Iris  Data  using  GDA-  Gaussian  Kernel-variance=0.7 


Figure  3.7.  GDA  projection  of  Iris  data.  All  classes  are  separable  (a  =  0.7^  Average 

variation.). 
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2-D  Projected  Training  Iris  Data  using  GDA-  Gaussian  Kernel-variance=7 


Figure  3.8.  GDA  projection  of  Iris  data.  The  two  classes  are  not  entirely  separable 

(<7  =  7  Large  variation.). 


C.  CONCLUSIONS 

This  chapter  presented  the  basic  concepts  behind  kernel-based  algorithms  and 
their  applications  to  classification  problems.  It  also  described  the  GDA  implementation, 
which  constitutes  the  main  research  tool  of  this  study. 

The  following  chapter  discusses  the  application  of  the  GDA  to  the  IR  face  data¬ 
base. 
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IV.  GDA  RESULTS 


This  chapter  considers  the  application  of  the  GDA  approach  to  the  IR  face  data¬ 
base.  First,  it  investigates  how  the  specific  type  of  distance  metrics  selected  in  the  algo¬ 
rithm  may  affect  classification  performances.  Second,  the  concepts  of  rank  score  and  con¬ 
fidence  interval  are  reviewed  and  it  is  shown  how  they  can  be  applied  toward  analyzing 
and  comparing  classifier  performances. 

Next,  we  apply  the  above  distance  measures  to  the  GDA  implementation  consid¬ 
ered  in  this  study.  Specifically,  we  investigate  the  link  between  distance  metrics  and  re¬ 
sulting  classification  performances  obtained  using  the  IR  database.  We  also  investigate 
the  links  between  kernel  function  type,  and  number  of  eigenvectors  selected  in  the  GDA 
projection  operation  and  the  resulting  classifier  performances. 

A.  THEORETICAL  BACKGROUND 

1.  Introduction 

First,  this  section  considers  various  types  of  distances  commonly  used  in  pattern 
classification  applications.  Second,  it  reviews  the  concepts  of  rank  score,  confidence  in¬ 
terval,  and  confidence  score.  These  parameters  are  used  to  evaluate  how  specific  dis¬ 
tances,  kernel  types,  and  number  of  eigenvectors  extracted  from  the  kernel  matrix  K  in¬ 
fluence  the  classifier  performances. 

2.  Distances  Considered 

According  to  [Duda,  2001],  pattern  classification  is  a  one-to-one  assignment  data 
to  a  class  belonging  to  a  set  of  pre-defmed  element  classes.  This  assignment  is  based  on 
the  concept  of  nearest  neighbor  classifier ,  which  is  in  turn  based  on  the  minimization  of  a 
metric  or  distance  function  between  testing  data  and  each  class  centroid. 

Distances  are  measures  which  evaluate  the  difference  between  two  vectors  v  and 
w.  Many  types  of  distances  have  been  commonly  used  in  engineering  applications,  such 
as  the  Euclidian  Norm,  Norm  1,  etc.  Some  distances  are  known  to  perform  better  than 
others  depending  on  the  data  and  the  specific  pattern  classification  algorithm  considered. 
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The  following  section  investigates  five  different  distance  measures  commonly  used  in 
pattern  classification  applications  which  are  considered  in  this  study.  We  will  select  the 
distance  leading  to  best  overall  classification  performances  in  the  rest  of  the  study. 

The  following  five  distances  are  considered,  where  it  is  assumed  that  the  vectors  v 
and  w  are  real-valued  and  of  dimension  (Nxl)  [Socolinsky,  2003]: 

1 .  Euclidean  Norm  2 : 

r iv  ~ 

L2(v,w)  =  .[£i(vi-wi)  ,  (4.1) 


2.  Mahalanobis: 

M(v,  w)  =  (v  -  w)T  IT1  (v  -  w), 


where  Z  is  the  data  covariance  matrix  defined  as  [Bishop,  1995]: 


2  =  E 


(v- w)(v- w)7 


(4.2) 


(4.3) 


3.  Mahalanobis  Norm  1: 

Ml(v,w)  =  YJ  <?i 2  I  v,.  -  w,  |,  (4.4) 

i= 1 

where  cr  is  an  estimate  of  the  data  variance  along  the  ith  coordinate  direction,  and  vi,wi 
are  the  ith  vector  coordinates  of  v  and  w  respectively, 

4.  Mahalanobis  Norm  2: 

r iv  ~ 

M2(v,w)  =  -w,)  »  (4-5) 


5. 


Mahalanobis  Angular: 


N 


Mang  (v,  w)  =  arccos 


L2(0,  v)L2(0,w) 


(4.6) 


3.  Rank  Score 

The  notion  of  rank  score  (also  known  as  cumulative  matching  score)  is  based  on 
the  probability  of  guessing  the  data  class  assignment  correctly  [Philips,  1996],  As  a  re- 
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suit,  a  rank  score  equal  to  one  (referred  to  as  rank-1  score)  is  defined  as  the  probability  of 
making  a  correct  class  assignment.  Accordingly,  a  rank  score  equal  to  2  (i.e.,  rank-2 
score)  corresponds  to  the  probability  of  making  a  correct  class  assignment  in  the  top  (i.e., 
most  likely)  two  decisions,  and  rank-N  score  refers  to  as  the  probability  of  making  a  cor¬ 
rect  assignment  within  the  top  N  candidates.  Extending  this  concept,  a  rank-C  score  de¬ 
fined  on  a  C-class  data  is  equal  to  1 ,  as  all  potential  classes  are  included.  In  addition,  a 
classifier  with  a  100%  classification  accuracy  has  a  rank  score  plot  with  rank-1  score 
equal  to  1 .  Therefore,  the  rank  score  value,  expressed  as  a  function  of  the  rank  index,  is  a 
monotonically  increasing  function  in  [0,  1],  and  is  commonly  used  in  evaluating  classifier 
performances  [Philips,  1996]. 

Assume  that  Fig.  4.1  represents  the  rank  score  obtained  for  some  classifier.  This 
example  shows  that  the  probability  of  making  a  correct  class  decision  (obtained  for  rank- 
1  score)  is  equal  to  0.8.  Next,  the  probability  of  making  a  correct  decision  in  the  first  or 
second  top  decisions  (represented  with  the  rank-2  score  quantity)  is  equal  to  0.84.  The 
example  also  shows  that  all  correct  guesses  are  contained  in  the  first  top  five  guesses,  as 
the  rank-5  score  is  equal  to  1 .  In  the  following  sections,  references  to  specific  classifica¬ 
tion  performance  measurements  will  refer  to  rank-1  score  values,  unless  specified  other¬ 
wise. 
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Figure  4. 1 .  Rank  score  example. 


4.  Confidence  Interval  Definitions 

Recall  that  the  sample  mean  computed  from  finite  data  only  approximates  the 
(true)  population  mean,  with  accuracy  depending  on  the  data  size  [Jain,  1991],  A  useful 
statistical  measure  in  addition  to  the  sample  mean  is  the  associated  confidence  interval, 
which  provides  the  user  with  a  measure  of  reliability  of  the  observed  mean.  Ideally,  a 
confidence  interval  width  should  be  as  small  as  possible,  because  the  true  mean  lies 
within  the  confidence  interval  with  some  predefined  probability  and  thus  it  is  easier  to 
estimate  its  true  value. 


Next,  the  concept  of  confidence  interval  (Cl)  is  briefly  reviewed.  Assume  that 
there  are  n  observations  [x1;x2,...,xn]  obtained  for  one  trial  of  the  whole  population  of 

observations  (i.e.,  measurements)  [Jain,  1991],  Further,  assume  that  the  population  has 
true  mean  //,  standard  deviation  a,  and  sample  mean  Ji .  Note  that  the  true  mean  //  is  a 
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deterministic  number,  whereas  the  sample  mean  ^  is  a  random  number.  As  a  result, 
sample  means  computed  from  different  data  trials  will  be  close  to  the  true  mean,  but  will 
most  likely  be  different  from  each  other. 

The  fact  that  the  sample  size  is  finite  means  that  Ji  is  just  an  estimate  of  the  true 
mean  /l  Thus,  it  is  quite  useful  to  estimate  the  corresponding  confidence  interval  \bvb2  ] 

defined  around  ju ,  as  this  information  gives  the  user  a  measure  of  confidence  in  the  esti¬ 
mated  true  mean.  The  values  bvb2  are  called  the  Cl  lower  and  upper  bounds,  respec¬ 
tively.  The  measure  of  confidence  is  defined  as  l- a,  where  a  is  called  the  significance 
level,  and  100  -(l-«)  is  called  the  confidence  level.  This  definition  means  that  there  is 
100  -(l  —  a)%  probability  that  the  true  mean,  computed  from  multiple  trials,  lies  in  the 
derived  confidence  interval  [6,,/y],  However,  one  trial  can  be  selected  to  estimate  the 
confidence  interval  associated  with  the  sample  data  mean  Ju,  provided  the  sample  size  n 
is  large  enough,  as  a  result  of  the  central  limit  theorem.  In  such  a  case,  Ji  can  be  shown 

to  be  approximately  Gaussian  with  mean  ju  and  standard  deviation  ~^=,  where  n  and  cr 

are  the  sample  size  and  standard  deviation  of  the  data  sequence,  respectively. 


It  can  be  shown  that  the  data  mean  confidence  interval  can  be  expressed  generally 
as: 


Ma] 


(7 


(4.7) 


where  t. 


1 — ;n- 
2 


\n- 1 


is  obtained  from  the  Student’s  t-distribution,  where  n  - 1  represents  the 


number  of  degrees  of  freedom  and  n  is  the  sample  size.  However,  for  large  sample  sizes, 
the  t-distribution  values  can  be  approximated  by  the  normal  distribution  values.  There¬ 
fore,  given  that  the  sample  size  is  1000  for  this  study,  the  Cl  is  computed  using  the  nor¬ 
mal  distribution  as: 


[J,b2\ 


tu-z  a-^=,M  +  z 

l~2  '■ 


(4.8) 
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In  this  study  we  select  a  95%  confidence  interval,  i.e.,  a  =  0.05,  which  leads  to  a 

value  of  z  a  =  z  005  =1.960,  [Jain,  1991].  As  a  result,  the  corresponding  95%  confi- 

1  2  *  ~ 

dence  interval  is  defined  as: 


[b„b2] 


/7 - 1. 960 -^,/7  + 1.960-^  , 
Vh  \jn  _ 


(4.9) 


where  a  and  // ,  and  n  are  the  standard  deviation,  the  estimated  mean,  and  the  sample  size 
of  the  measured  data  x.  The  value  of  z  „  for  the  Gaussian  distribution  can  be  found  in 

i-- 

2 

look-up  tables  contained  in  statistical  textbooks  and  others  [Jain,  1991], 


Confidence 

Interval 


Figure  4.2.  Mean  confidence  interval. 


5.  Confidence  Score 

Confidence  scores  are  scalar  quantities  defined  in  the  range  [0,  1]  which  provide 
an  evaluation  of  the  confidence  in  the  class  decision.  As  a  result,  the  confidence  score 
ui}  of  image  j  represents  the  likelihood  that  it  belongs  to  class  i.  The  confidence  score 

value  is  defined  for  a  C-class  problem  as: 


uij=- 


2  ’ 


k= 1 


dy  Yl"1-1) 

d,.  i 

V  kJ  J 
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(4.10) 


where  d tj  represents  the  distance  between  image  j  and  the  ith  class  centroid,  and  m  is  a 
weighting  exponent  which  can  take  values  between  [1,  oo]  [Jang,  1997],  We  selected 
m  =  2  in  this  study.  Various  distance  types  may  be  selected  in  the  confidence  score  defi¬ 
nition.  In  this  study  we  selected  the  Mahalanobis  Angular  distance  as  it  closely  approxi¬ 
mates  classification  performances  obtained  with  the  Mahalanobis  distance  at  a  reduced 
computational  cost.  Further  details  regarding  the  distance  metric  selection  are  discussed 
in  Chapter  IV.B.4. 

6.  Kernel  Matrix  K  Eigen-Decomposition  Issues 

An  important  parameter  in  this  study  is  the  number  of  top  eigenvectors  of  the  ker¬ 
nel  matrix  used  in  the  derivation  of  the  GDA  projection  vectors.  Recall  that  eigenvectors 
associated  with  “negligible”  eigenvalues  may  be  discarded  in  the  computation  of  the 
GDA  projection  vectors  v ’s,  as  defined  in  Chapter  III.B.2,  since  they  are  expected  to 
have  little  impact  on  the  resulting  projected  data.  Further,  recall  that  the  kernel  matrix 
size  is  directly  related  to  the  number  of  training  data  samples  (900),  resulting  in  a 
(900x900)  matrix  for  our  study.  Figure  4.3  plots  the  eigenvalues  as  a  function  of  the 

eigenvalue  index  obtained  from  the  eigen-decomposition  of  the  kernel  matrix  K,  for  one 
typical  iteration  in  the  database.  Note  that  eigenvalue  magnitudes  decrease  quite  rapidly, 
and  that  those  corresponding  to  index  500  or  higher  do  not  seem  to  contribute  signifi¬ 
cantly  to  the  kernel  matrix  structure.  Further,  note  that  truncating  the  eigen- 
decomposition  can  result  in  significant  computational  savings. 

Therefore,  we  investigated  whether  truncating  the  kernel  matrix  K  eigen- 
decomposition  specifically  affected  the  resulting  classification  performances  with  the  fol¬ 
lowing  experiments: 

•  Keep  all  eigenvectors  assigned  to  eigenvalues  with  value  greater  than  or 
equal  to  the  maximum  eigenvalue  divided  by  1000  (unless  specified  oth¬ 
erwise,  this  is  the  referred  to  as  the  “default  case”  as  was  proposed  by 
Baudat  [Baudat,  October  2000],  For  practical  purposes,  simulations  have 
shown  that  this  choice  comes  down  to  keeping  about  the  top  first  700  ei¬ 
genvectors,  i.e.,  the  eigenvectors  associated  with  the  top  700  eigenvalues 
in  magnitude). 

•  Keep  the  top  100  eigenvectors. 

•  Keep  the  top  150  eigenvectors. 
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Eigenvalue  Magnitude 


•  Keep  the  top  200  eigenvectors. 

•  Keep  the  top  250  eigenvectors. 

•  Keep  the  top  500  eigenvectors. 

The  results  are  presented  in  Chapter  IV.B. 


x  iq8  GDA  K  Matrix  Eigenvalues 


Figure  4.3.  Eigenvalues  of  the  kernel  matrix  K. 

7.  Kernel  Function  Types  Considered 

Various  types  of  polynomials  may  be  used  in  kernel-based  schemes,  as  men¬ 
tioned  earlier.  First,  the  Gaussian  kernel  was  selected,  as  it  has  been  used  with  success  in 
numerous  applications.  Next,  polynomials  of  different  degrees  were  considered  such  as: 

•  Polynomial  of  degree  1 :  k(x,  y)  =  (x  ■  y),  which  is  equivalent  to  the  LDA 
implementation  [Baudat,  2000], 

•  Polynomial  of  degree  2:  k(x,  y)  =  (x-  y)2 , 

•  Polynomial  of  degree  3 :  k(x,  y)  =  (x-  yf . 
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Results  showed  the  Gaussian  kernel  led  to  similar  classification  performances  as 
that  obtained  for  the  polynomial  of  degree  2,  for  the  selected  spread  value  cr  =  7000.  In 
addition,  it  was  noted  that  selecting  the  polynomial  kernel  allowed  us  to  efficiently  vec¬ 
torize  the  Matlab  code,  initially  written  by  Baudat  and  Anouar  [Baudat,  October  2000], 
and  significantly  reduce  the  associated  computational  load.  Details  regarding  the  vectori- 
zation  operations  are  presented  in  Appendix  B.  Specifically,  it  was  noted  that  the  Matlab 
implementation  with  Gaussian  kernel  took  about  7.5  to  8  minutes  per  iteration,  while  the 
vectorized  Matlab  implementation  with  polynomial  kernel  of  degree  2  took  about  75  sec¬ 
onds  per  iteration  in  the  database,  using  a  3-GHz  PC. 

B.  EXPERIMENT  DESCRIPTION 

1.  Data  Selection  Procedures 

The  database  considered  in  this  study  is  that  previously  used  in  [Lee,  2004],  and 
specific  details  regarding  the  collection  process  and  data  manipulation  can  be  found 
within.  The  database  includes  50  classes  (adult  subjects).  Each  subject  was  asked  to  gaze 
at  one  of  ten  points  on  the  wall  behind  the  camera  to  introduce  angle  and  tilt  variations  in 
the  subject  pose.  Each  class  has  the  following  three  sections: 

1 .  Photographed  in  a  neutral  pose. 

2.  Photographed  pronouncing  the  vowel  “u”. 

3.  Photographed  while  smiling. 

As  a  result,  the  complete  database  consists  of  (50xl0x3)  =  1500  images.  All  IR 

images  were  then  cropped  to  retain  only  the  middle  section  of  the  face,  which  excluded 
the  chin,  some  of  the  forehead,  and  the  ears,  resulting  in  cropped  images  of  dimension 
equal  to  (45  x  60)  pixels.  Next,  cropped  images  were  reshaped  as  column  vectors  of  di¬ 
mension  (2700  x  l) ,  resulting  in  an  overall  database  matrix  of  size  (2700x1 500) . 

The  data  were  split  into  nonoverlapping  training  and  testing  sets,  and  40%  and 
60%  of  the  data  per  class  were  selected  for  testing  and  training  sets,  respectively.  We  im¬ 
plemented  a  cross-validation  variant  to  estimate  classification  performances  based  on 
resampling  and  averaged  over  multiple  repetitions  of  each  experiment  (referred  to  as  an 
iteration  in  this  study).  Therefore,  for  each  iteration,  images  in  each  section  and  class  are 
randomly  permuted,  and  the  first  four  resulting  images  (40%)  of  each  section  and  class 
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were  selected  to  create  the  Testing  Image  Data  Matrix  (TEIDM),  while  the  last  six  im¬ 
ages  (60%)  were  selected  to  form  the  Training  Image  Data  Matrix  (TRIDM).  As  a  result, 
the  TEIDM  and  TRIDM  have  dimensions  equal  to (2700x600)  and (2700x900), respec¬ 
tively. 

2.  Number  of  Iterations  per  Each  Individual  Experiment 

A  significant  body  of  research  dealing  with  small  sample  size  experiments  has 
been  conducted  in  the  research  community,  and  most  studies  consider  some  type  of  aver¬ 
aging  operations  to  reduce  the  impact  of  the  data  split  between  training  and  testing  sets. 

In  this  study,  we  implemented  a  cross-validation  variant  based  on  resampling.  We  con¬ 
sidered  500  and  1000  iterations  (i.e.,  repetitions  of  each  experiment)  and  selected  1000 
iterations  to  collect  sufficient  reliable  statistical  data  regarding  classification  perform¬ 
ances. 

3.  Software  Implementation 

Four  main  categories  of  experiments  were  conducted  in  this  study: 

1 .  Implementation  of  the  Fisherface  algorithm,  using  the  GDA  with  polyno¬ 
mial  kernel  of  degree  1,  to  benchmark  results  against  those  obtained  ear¬ 
lier  by  Lee  [Lee,  2004], 

2.  Selection  of  the  best  distance  metric  to  be  applied  in  the  classifier.  This  se¬ 
lection  was  conducted  using  3-dimensional  distance  map  plots,  which  are 
discussed  later  in  Chapter  IY.B.4. 

3.  Computation  of  rank  scores  and  overall  classification  performance  per 
class  for  40/60  cross-validation  variant  and  1000  iterations. 

4.  Computation  of  95%  confidence  intervals  for  rank-1  to  rank-5  scores 
mean  classification  results. 

All  the  software  code  developed  for  these  simulations  is  included  in  Appendix  B. 
Note  that  the  software  implementation  used  in  this  study  was  either  written  by  the  author 
or  modified  from  that  developed  earlier  by  Baudat  [Baudat,  October  2000].  The  main 
steps  associated  with  a  given  iteration  are  listed  in  Table  4.1. 
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Data  Selection 

1 .  Execute  one  random  permutation  of  the  images  within  each  section 
and  class,  as  discussed  in  Chapter  IV.B.l  above. 

2.  Generate  TRIDM  and  TEIDM  matrices,  as  described  in  Chapter 

IV.B.l. 

GDA  steps 

3.  Apply  the  GDA  algorithm  to  the  TRIDM  to  create  the  kernel  matrix 

K. 

4.  Compute  the  eigen-decomposition  of  the  kernel  matrix  K. 

5.  Project  TRIDM  and  TEIDM  matrices  onto  the  kernel  matrix  K. 

Classification 

steps 

6.  Compute  the  class  centroids  from  the  projected  TRIDM  information. 

7.  Compute  the  projected  training  images  covariance  matrix  (needed  for 
the  Mahalanobis  distance  considered  in  the  study). 

8.  Compute  the  distance  between  projected  training  data  and  class  cen¬ 
troids  and  between  projected  testing  data  and  class  centroids. 

9.  For  each  distance  considered  and  projected  training  and  testing  data 
sample,  make  class  decision  by  selecting  as  class  that  which  leads  to 
the  smallest  distance  between  projected  data  and  class  centroids. 

Table  4. 1 .  Main  software  steps  required  for  a  given  iteration. 

4.  Distance  and  Eigenvector  Impacts  on  Classification  Results 

Recall  that  the  classifier  scheme  analyzed  in  this  study  belongs  to  the  family  of 
distance  classifiers.  Thus,  the  specific  choice  of  distance  may  be  an  important  parameter 
in  the  resulting  algorithm  performance.  As  a  result,  we  investigated  how  the  selection  of 
the  specific  distance  influenced  resulting  classifier  performances.  In  addition,  the  specific 
number  of  eigenvectors  selected  in  the  GDA  kernel  matrix  K  is  also  an  important  user- 
specified  parameter.  Therefore,  we  also  investigated  the  impact  the  number  of  eigenvec¬ 
tors  extracted  from  the  kernel  matrix  K  has  on  resulting  classifier  performances.  To  visu¬ 
alize  the  impact  these  two  parameters  have  on  the  classifier  performances,  we  generated 
3-dimensional  training  and  testing  distance  maps,  which  plot  the  distances  between  each 
training  (testing)  image  and  all  training  data  centroids,  respectively  for  the  five  distance 
types  discussed  earlier  in  Chapter  IV.A.2,  while  varying  the  number  of  top  eigenvectors 
in  the  kernel  matrix  K. 

Figure  4.4  illustrates  the  Matlab  color  code  variations  used  in  Figs.  4.5  to  4.14., 
which  investigate  the  classifier  performance  as  a  function  of  the  distance  used.  Table  4.2 
lists  all  experiments  conducted  according  to  the  various  types  of  distances  used,  the  type 
of  images  selected  (testing/training),  and  the  number  of  kernel  matrix  K  eigenvectors  se- 
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lected  for  the  projections.  These  simulations  were  repeated  ten  times  and  Figs.  4.5  to  4.14 
represent  a  typical  result,  (as  significant  consistency  was  observed  between  trials). 


Figure  4.4.  Matlab  color  code  scaling. 


Type  of  Distance  used 

Training  im¬ 
ages,  Default 

Training 
images,  250 

Testing  im¬ 
ages,  Default 

Testing  im¬ 
ages,  250 

Norm-2 

X 

X 

X 

X 

Mahalanobis 

X 

X 

X 

X 

Mahalanobis  Norm-2 

X 

X 

X 

X 

Mahalanobis  Norm-1 

X 

X 

X 

X 

Mahalanobis  Angular 

X 

X 

X 

X 

Table  4.2.  Distance  maps  computed.  “Default”  or  “250”  refers  to  the  number  of  ker¬ 
nel  matrix  eigenvectors  kept  in  the  GDA  algorithm  step. 


Each  figure  corresponds  to  a  specific  distance  and  number  of  kernel  matrix  K  ei¬ 
genvectors  used  in  the  GDA  projection  step,  and  is  subdivided  into  four  sections.  The  two 
sub-figures  in  the  top  row  represent  the  3-dimensional  distance  maps  obtained  for  train¬ 
ing  images  (left)  and  testing  images  (right),  respectively.  The  bottom  row  represents  the 
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corresponding  contour  plots.  Distance  maps  generated  are  of  dimension  (Jxf),  where  X 

represents  the  number  of  training  (equal  to  900),  or  testing  (equal  to  600)  images  on  the  x 
axis,  and  Y  (equal  to  50)  represents  the  number  of  classes  on  the  y  axis.  Finally,  the  verti¬ 
cal  z  axis  represents  the  measured  distance  between  each  sample  and  all  training  data 
class  centroids. 

The  distance  maps  are  set  up  so  that  diagonal  values  represent  the  distance  be¬ 
tween  each  image  and  its  own  class  centroid,  while  off-diagonal  elements  represent  dis¬ 
tances  between  images  and  centroids  from  other  classes.  As  a  result,  one  would  expect  to 
see  small  distances  (i.e.,  blue  colors)  on  the  diagonal  elements  and  larger  ones  (i.e.,  to¬ 
wards  red  colors)  on  the  off-diagonal  elements  to  represent  good  classification  behavior. 
The  “perfect”  classifier  should  exhibit  a  distance  map  showing  uniform  deep  blue  diago¬ 
nal  elements  and  uniform  red  off-diagonal  elements.  As  a  result,  these  distance  maps 
make  it  possible  to  evaluate  the  resulting  classifier  performances  visually  by  noting: 

•  How  uniformly  deep  blue  diagonal  elements  are  (deep  blue  color  on  the 
main  diagonal  insures  the  smallest  distances  are  observed  between  images 
and  centroids  from  their  own  class). 

•  How  red  off-diagonal  elements  are  (red  on  the  off-diagonal  elements  in¬ 
sures  larger  distances  between  images  and  other  class  centroids  are  ob¬ 
served). 

•  How  uniformly  red  the  off-diagonal  elements  are  (uniformly  red  color  on 
the  off-diagonal  elements  indicate  low  variance  in  the  results). 

In  addition,  results  show  that  classification  performances  increase  with  the  num¬ 
ber  of  kernel  matrix  eigenvectors  selected  in  the  GDA  computation,  except  for  one  spe¬ 
cific  case  discussed  later.  Results  also  show  that  better  performances  are  observed  when 
dealing  with  training  images  than  for  testing  images,  as  expected  because  the  centroids 
are  derived  from  the  training  data  itself. 
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Figure  4.5.  3-Dimensional  plots-contours  using  Norm-2  distance  and  default  number 

of  eigenvectors. 
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Figure  4.6.  3-Dimensional  plots-contours  using  Mahalanobis  distance  and  default 

number  of  eigenvectors. 
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Figure  4.7.  3-Dimensional  plots-contours  using  Mahalanobis  Norm-2  distance  and 

default  number  of  eigenvectors. 
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Figure  4.8. 


3-Dimensional  plots-contours  using  Mahalanobis  Norm-1  distance  and 
default  number  of  eigenvectors. 
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Figure  4.9.  3-Dimensional  plots-contours  using  Mahalanobis  Angular  distance  and 

default  number  of  eigenvectors. 
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Figure  4.10. 


3-Dimensional  plots-contours  using  Norm-2  distance  and  250  eigenvec¬ 
tors. 
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Figure  4. 1 1 .  3-Dimensional  plots-contours  using  Mahalanobis  distance  and  250  eigen¬ 
vectors. 
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Figure  4.12.  3-Dimensional  plots-contours  using  Mahalanobis  Norm-2  distance  and 

250  eigenvectors. 
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Figure  4.13.  3-Dimensional  plots-contours  using  Mahalanobis  Norm-1  distance  and 

250  eigenvectors. 
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Figure  4.14. 


3-Dimensional  plots-contours  using  Mahalanobis  Angular  distance  and 
250  eigenvectors. 
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5.  Rank  Score  and  Overall  Class  Performance 

Cumulative  Rank  Scores  (CRS)  and  average  classification  rates  are  computed  to 
investigate  the  impact  that  the  various  distances,  polynomial  degree  orders  and  numbers 
of  eigenvectors  from  the  kernel  matrix  K  selected  in  the  GDA  step  have  on  overall  classi¬ 
fier  performances.  All  results  provided  are  obtained  with  the  40/60  cross-validation  vari¬ 
ant  discussed  earlier  in  Section  B.l  and  1000  repetitions  (i.e.,  iterations)  to  insure  results 
are  statistically  meaningful.  Software  implementations  computing  rank  scores  and  aver¬ 
age  classification  rates  are  provided  in  Appendix  B. 

Figures  4.15  and  4.16  show  the  rank  score  and  classification  performance  on  a  per 
class  basis,  obtained  for  the  testing  images  dataset,  for  one  representative  iteration.  Addi¬ 
tional  random  iteration  rank  score  plots  are  included  in  Appendix  C. 


Cummulative  Rank  Score  for  600  Testing  Images 


Figure  4. 1 5.  Testing  data  CRS  plot;  Mahalanobis  Angular  distance,  default  number  of 
eigenvectors  selected  in  the  GDA  step,  one  iteration. 
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Classification  Performance 


Classification  Performance  per  Class 


Class  Index 


Figure  4. 16.  Testing  data  average  classification  rate;  Mahalanobis  Angular  distance, 
default  number  of  eigenvectors  selected  in  the  GDA  step,  one  iteration. 


Table  4.3  presents  average  classification  rates  and  associated  95%  confidence  in¬ 
tervals  obtained  for  all  scenarios  investigated  in  the  study. 

The  following  comments  can  be  made: 

•  Results  obtained  for  the  Fisherface  implemented  as  PC  A  and  GDA  with 

polynomial  kernel  of  degree  1  and  default  number  of  eigenvectors  selected 
in  the  GDA  step  are  similar  to  those  obtained  earlier  by  Lee  [Lee,  2004],  It 
is  also  noted  that  the  Euclidian  distance  leads  to  better  classification  re¬ 
sults  (94.59%)  than  the  Mahalanobis  Angular  distance  does  (93.70%). 
Therefore,  there  is  no  advantage  when  using  the  Fisherface  in  selecting  the 
Mahalanobis  Angular  distance. 
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•  Results  show  that  when  selecting  a  polynomial  kernel  of  order  2  in  the 
GDA  step  and  keeping  eigenvectors  as  specified  in  Baudat  [Baudat,  Octo¬ 
ber  2000]  (shown  in  row  3  of  Table  4.3),  the  best  average  classification 
rates  are  obtained  with  the  Mahalanobis  distance  (98.44%),  closely  fol¬ 
lowed  by  the  Mahalanobis  Angular  distance  (98.41%).  Recall  that  the  Ma¬ 
halanobis  Angular  distance  is  less  computationally  intensive  as  it  does  not 
require  computing  the  data  covariance  matrix  inverse.  As  a  result,  the  Ma¬ 
halanobis  Angular  distance  was  selected  for  all  follow-on  experiments, 
and  all  other  results  are  based  on  this  metric,  unless  specified  otherwise. 

•  Figures  4. 1 1  and  4. 14  show  that  using  only  250  kernel  matrix  K  eigenvec¬ 
tors  and  the  Mahalanobis  Angular  distance  appears  to  lead  to  better  sepa¬ 
ration  between  intra-class  and  inter-class  distance  values,  and  more  con¬ 
sistent  inter-class  distance  values,  than  observed  with  the  Mahalanobis 
distance. 

•  Results  show  that  reducing  the  number  of  kernel  matrix  K  eigenvectors  se¬ 
lected  in  the  GDA  step  usually  results  in  classification  performance  degra¬ 
dations.  Specifically,  the  investigation  was  conducted  by  selecting  the  top 
100,  150,  200,  250,  500,  and  the  default  number,  as  defined  by  Baudat,  for 
a  given  distance  metric.  Results  show  that  classification  performances  de¬ 
grade  as  the  number  of  eigenvectors  decreases,  with  the  exception  of  the 
slightly  better  performance  observed  with  500  eigenvectors  (98.55%)  than 
with  the  default  number  (around  700)  (98.41%),  which  it  is  not  viewed  as 
statistically  significant.  Overall,  eigenvectors  associated  with  very  small 
eigenvalues  are  expected  to  have  small  impact  on  the  resulting  classifica¬ 
tion  performances. 

•  Additionally  the  GDA  with  polynomial  kernel  of  degree  1  (i.e.,  LDA)  was 
investigated  directly  and  resulted  in  lower  classification  performances  than 
those  obtained  with  the  Fisherface  implementation. 

•  Results  show  a  very  slight  increase  in  classification  performances  by  in¬ 
creasing  the  polynomial  degree  to  3  (98.45%)  from  2  (98.41%),  when  se¬ 
lecting  the  default  number  of  kernel  matrix  K  eigenvectors. 
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Overall  Classification  rates:  Mean  and  95%  confidence  interval 


Algorithm 

selected 

(polynomial  deg. 
order,  number  of 
eigenvectors  se¬ 
lected  in  the 
GDA  step) 

DISTANCE  SELECTED 

Norm-2 

Mahalanobis 

Mahalanobis 

Norm-2 

Mahalanobis 

Norm-1 

Mahalanobis 

Angular 

GDA 

(1,  default) 

0.9273 

[0.9266,  0.9280] 

0.9282 

[0.9275,  0.9289] 

Fisherface 
(PCA+GDA 
(1,  default)) 

0.9459 

[0.9453,  0.9466] 

0.9370 

[0.9263,  0.9376] 

GDA 

(2,  default) 

0.9826 

[0.9822,  0.9831] 

0.9844 

[0.9840,  0.9848] 

0.9826 

[0.9822,  0.9831] 

0.9799 

[0.9795,  0.9804] 

0.9841 

[0.9837,  0.9845] 

GDA 
(2,  500) 

0.9855 

[0.9851,0.9859] 

GDA 
(2,  250) 

0.9774 

[0.9769,  0.9778] 

GDA 
(2,  200) 

0.9670 

[0.9665,  0.9676] 

GDA 
(2, 150) 

0.9448 

[0.9442,  0.9455] 

GDA 
(2, 100) 

0.8944 

[0.8935,  0.8953] 

GDA 

(3,  default) 

0.9845 

[0.9841,0.9849] 

Table  4.3.  Classification  rates  for  testing  data;  40%/60%-cross-validation  variant, 
1000  iterations;  means  and  95%  confidence  intervals  as  a  function  of  the  kernel  function 
polynomial  degree,  distance  type,  and  number  of  eigenvectors  selected  for  the  GDA  step. 


6.  Mean  Performance  Confidence  Intervals 

Figure  4.17  plots  the  95%  confidence  intervals  obtained  for  rank-1  to  rank-5 
scores  mean  classification  rates  with  the  five  distances  investigated  in  our  study  and  ker¬ 
nel  function  polynomial  of  degree  2.  Results  show  that  best  rank-1  to  rank  -5  scores  are 
consistently  obtained  with  the  Mahalanobis  distance  (as  noted  earlier  for  rank-1  scores  in 
Table  4.3).  Results  also  show  that  the  Mahalanobis  Angular  distance  rank-1  to  -5  scores 
are  consistently  very  close  to  the  Mahalanobis  scores,  which  further  validates  the  selec¬ 
tion  of  the  Mahalanobis  Angular  distance  as  the  metric  of  choice  in  this  classifier  imple¬ 
mentation.  The  Mahalanobis  Angular  distance  has  better  performance  with  fewer  eigen¬ 
vectors  selected  than  the  Mahalanobis  distance. 
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Mean  Confidence  Intervals  of  first  5  Rank  Scores  for  Testing  Data 
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Figure  4.17.  95%  Confidence  interval  of  the  means  of  ranks  1,  2,  3,  4,  and  5  of  the  five 

distances  listed  on  the  legend. 

7.  Confidence  Score 

The  confidence  score  provides  the  user  with  a  measure  to  evaluate  the  confidence 
level  with  which  each  image  belongs  to  a  class.  This  parameter  is  bound  between  0  and  1 
and  is  useful  when  evaluating  the  classifier  decision  confidence,  and/or  when  fusing  vari¬ 
ous  classifier  decisions  on  the  same  data.  Figures  4.18  and  4.19  plot  representative  3- 
dimensional  confidence  score  maps  obtained  for  one  iteration.  The  dimension  of  the  ma¬ 
trix  plotted  is  again  equal  to  (Jxf),  where  X  corresponds  to  the  number  of  testing  im¬ 
ages  (600),  and  Y  represents  the  number  of  classes  (equal  to  50  in  our  study).  Note  that  a 
perfect  classifier  should  lead  to  a  confidence  score  map  with  diagonal  elements  equal  to  1 
and  off-diagonal  elements  equal  to  0.  Figures  4.18  (3-dimensional  plot)  and  4.19  (asso¬ 
ciated  contour)  show  a  confidence  score  map  that  has  diagonal  elements  significantly 
higher  than  off-diagonal  elements  with  average  values  equal  to  0.5  and  0.02,  respec¬ 
tively.  Thus,  the  results  show  good  average  performances  on  that  specific  iteration.  The 
software  implementation  for  this  computation  is  included  in  Appendix  B. 
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Score 


Testing  Images  Confidence  Score 


Testing  Image  Index  Class  |ndeK 

Figure  4.18.  Testing  set  3-dimensional  confidence  score  map  for  one  representative 

iteration. 
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Testing  Images  Confidence  Score 


Figure  4.19.  Testing  set  confidence  score  map  for  one  representative  iteration. 

C.  CONCLUSIONS 

This  chapter  presented  the  various  parameters  used  to  evaluate  the  GDA-based 
classification  schemes  considered  in  this  study.  We  defined  the  concept  of  distance  maps 
and  showed  how  they  can  be  used  to  visually  evaluate  classifier  performances.  We  also 
discussed  the  concept  of  average  classification  rates  and  95%  confidence  intervals,  and 
confidence  scores.  The  concepts  of  rank  scores  were  reviewed  and  applied  to  compare  the 
classifiers  investigated  in  this  study.  We  considered  five  different  types  of  distances  and 
investigated  the  impact  that  the  specific  number  of  eigenvectors  extracted  from  the  GDA 
kernel  matrix  K  and  selected  in  the  GDA  projection  step  has  on  resulting  classifier  per¬ 
formances.  Results  showed  that  the  optimum  performance  was  recorded  for  the  Maha- 
lanobis  distance,  used  with  polynomial  kernel  function  of  degree  2,  and  by  selecting  the 
top  500  eigenvectors  of  the  kernel  matrix  K  eigen-decomposition.  Results  also  show  that 
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the  classifier  set-up  with  the  Mahalanobis  Angular  distance  has  performances  closely 
approximating  those  obtained  with  the  Mahalanobis  distance,  at  a  much  reduced  compu¬ 
tational  cost.  Therefore,  the  Mahalanobis  Angular  distance  was  selected  as  the  metric  of 
choice  in  this  study. 
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V.  CONCLUSIONS 


This  study  extended  earlier  work  by  Pereira  [Pereira,  2002]  and  Lee  [Lee,  2004] 
who  investigated  the  recognition  of  infrared  face  images  collected  using  a  low  resolution 
uncooled  IR  camera  under  controlled  indoor  conditions.  This  study  considered  a  nonlin¬ 
ear  kernel-based  classifier,  the  Generalized  Discriminant  Analysis  (GDA)  approach  pro¬ 
posed  earlier  by  Baudat  [Baudat,  2000].  First,  two  basic  linear  classification  schemes 
were  reviewed,  the  PC  A  and  the  Fisherface  approach,  which  have  been  widely  used  in 
classification  applications.  Next,  the  basic  idea  behind  the  GDA  was  presented.  It  was 
shown  that  the  GDA  can  be  viewed  as  a  two-step  process.  First,  the  data  is  implicitly 
nonlinearly  projected  in  a  high-dimensional  feature  space  where  discrimination  becomes 
easier  to  be  performed.  Second,  the  LDA  is  implicitly  applied  to  the  transformed  data. 
Also  the  “kernel  trick”  concept  was  discussed,  which  is  the  fundamental  concept  behind 
the  GDA.  Finally,  we  proposed  a  vectorization  step  to  the  GDA  software  implementation 
proposed  by  Baudat,  which  allowed  us  to  speed  up  computations  by  a  factor  of  six. 

We  investigated  five  different  types  of  distances  and  selected  the  best  suited  for 
the  database  under  consideration  via  the  evaluation  of  3-dimensional  distance  maps.  The 
concepts  of  distance  classifiers,  rank  score,  confidence  interval  and  confidence  score 
were  reviewed  as  performance  measurement  tools.  A  cross-validation  variant  was  im¬ 
plemented  to  insure  that  the  obtained  results  were  statistically  significant.  Results  showed 
that  the  GDA  approach  leads  to  better  classification  performances  (98.55%)  than  those 
obtained  with  the  linear  classifiers  considered  in  the  two  earlier  studies,  and  increases  the 
average  classification  performance  by  5%  over  the  Fisherface  approach.  Results  also 
showed  that  a  combination  of  the  Mahalanobis  Angular  distance,  a  polynomial  kernel  of 
degree  2,  and  the  top  500  eigenvectors  lead  to  the  best  average  classification  perform¬ 
ances  for  the  GDA  implementation  for  the  database  under  consideration. 

In  conclusion,  this  study  has  shown  that  a  low-cost,  low-resolution  IR  camera 
combined  with  an  efficient  classifier  can  play  an  effective  role  in  security  related  applica¬ 
tions. 
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APPENDIX  A.  MATHEMATICAL  BACKGROUND 


This  appendix  presents  the  main  mathematical  concepts  used  in  the  derivations  of 
the  PC  A  and  Fisherface  approaches  discussed  in  Chapter  II.  First,  the  concept  of 
Rayleigh  quotient  is  reviewed.  Next,  linear  algebra  concepts  dealing  with  eigenvalues, 
eigenvectors,  and  rank  related  issues  are  discussed. 

A.  ANALYSIS 

1.  Rayleigh  Quotient  Analysis 

This  section  shows  that  computing  the  vector  W,  which  maximizes  the  Rayleigh 
quotient  to  obtain  the  Fisher  Discriminant  vectors  and  given  in  Eq.  (2.12),  Chapter  II.B.2: 


Wopt  =  al'g 


maxH, 


WtSbW  | 
WTSwW\J 


(A.1) 


is  equivalent  to  solving  the  generalized  eigenvalue  problem  shown  in  Eq.  (2.13),  Chapter 
II.  B.2: 


SHw  =  ASww. 


(A.2) 


Define  the  Rayleigh  quotient  expression  J(w)  as: 

I  WTS  W I 
J(W)  =  -  B 


W‘sww  I 


(A.3) 


The  vector  W  which  maximizes  J(  W )  may  be  obtained  by  computing  the  first  derivative 
ofd(fV)  and  making  it  equal  to  zero.  The  derivation  closely  follows  that  presented  in 
[Gutierrez,  2004]: 

~wtsbw' 


d[J(W)}_  [_W‘SnW 
dW  dW 


=  0: 


Jwr 

rT  i 


-[ 


dW  D  dW 

[WtSwW]2SbW-[WtSbW]2SwW  =  0. 


=  0 


(A.4) 


Dividing  the  above  equation  by  the  scalar  expression  W1  ShW  leads  to: 
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(A.5) 


WTSwW 

WTSwW 


SBW- 


IV%IV 

WTSwW 


SwW  =  0 


^SBW-JSwW  =  0 
=>  SBW  =  JSwW. 


Note  that  the  above  equation  is  a  generalized  eigenvalue  problem,  which  may  be 
rewritten  as: 


S~lSBW-JW  =  0, 


(A.6) 


which  leads  to: 


SwlSBW  =  JW, 


(A.7) 


when  Sw  is  a  non  singular  matrix.  Alternate  derivations  of  the  Fisher  Discriminant  vec¬ 
tors  when  the  matrix  Sw  is  singular  have  been  presented  by  [Cheng,  1992],  who  pro¬ 
posed  to  replace  the  singular  matrix  by  its  rank  decomposition.  However,  this  specific 
approach  does  not  appear  to  have  been  applied  in  the  face  recognition  field  and  is  not 
considered  further  in  our  study. 

2.  Computing  the  Number  of  Nonzero  Eigenvalues  of  the  Generalized 
Eigenvalue  Eigenvector  Decomposition  Problem 

We  show  in  this  section  that  the  following  generalized  eigen-problem, 

Sbw  =  ASww  (A.  8) 

where  SB  and  Sw  are  the  Between-Class-Scatter  (BCS)  and  Within-Class-Scatter 
(WCS)  matrices,  defined  below,  exhibits  at  most  C-lnon-zero  generalized  eigenvalues, 
when  Sw  is  not  singular,  where  C  is  the  number  of  the  classes. 

Recall  that: 

•  The  A-dimensional  matrix  SB  is  the  Between-Class-Scatter-Matrix,  is 
defined  as: 

c 

SB  =  ^  rij  ( mi  -  m){mi  -  m)T  (A.9) 

i= 1 

where  m  is  the  mean  of  all  the  element  vectors,  n:  and  mi,  are  the  number  of  element  vec¬ 
tors,  and  the  mean  vector  of  the  class  i,  respectively. 
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•  The  TV-dimensional  Sw  is  the  Within-Class-Scatter  Matrix,  is  defined  as: 

(A.  10) 

1=1  j= 1 

where  mj  and  ir  are  the  mean  vector  and  the  number  of  elements  vectors  of  the  class  i, 
respectively. 

Note  that,  when  SW  is  not  singular,  Eq.  (A.8)  may  be  rewritten  as: 

SwlSBW-JW  =  0,  (A.  11) 

which  leads  to 

SwlSBW  =  JW.  (A.  12) 

Next,  recall  that  the  rank  of  a  product  of  two  matrices  is  less  or  equal  to  the  minimum  of 
the  two  matrix  ranks  [Strang,  2003],  i.e., 

Rank(/)5)  <  m i n( Rank(4 ),  Rank(7? )) .  (A.  1 3) 

As  a  result,  the  rank  of  the  matrix  product  S~lSB  ( Rank  (A,).1 5^  ) )  is  less  or  equal 
to  the  smallest  rank  of  the  two  matrices  S'”1  and  5), . 

Further,  recall  that  the  rank  of  SB  is  equal  to  C  - 1  and  the  rank  of  Sw  is  equal  to 
P-C,  where  P  is  the  total  number  of  images,  as  will  be  discussed  in  Appendix  A.3.  Usu¬ 
ally  in  image  classification  applications,  C  -l«  P-C,  as  the  number  of  classes  is  much 
smaller  than  the  total  number  of  images.  Consequently,  we  have: 

Rank(S^Ss )  <  min  ( C  - 1,  P  -  C)  =C-1,  (A.  14) 

in  practical  image  classification  applications,  provided  that  Sw  is  not  singular.  Thus,  the 

maximum  number  of  nonzero  eigenvectors  extracted  from  Eq.  (A.8)  is  equal  to  C-l. 

3.  Computing  the  Maximum  Rank  of  the  Within-Class-Scatter  Matrix 
Sw  and  of  the  Between-Class-Scatter  Matrix  SB 

a.  Maximum  Rank  of  the  Within-Class-Scatter  Matrix 

In  this  section,  we  show  that  the  TV-dimensional  within-class-scatter 

(WCS)  matrix  Sw  has  maximum  rank  equal  to  P-C,  where  P  is  the  number  of  element 

vectors  in  the  training  set,  and  C  is  the  number  of  classes. 
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According  to  Eqs.  (2.9)  and  (2.10)  from  Chapter  II.B.2,  the  A-dimensional 
WCS  matrix  Sw  of  the  training  data,  {x; }  p ,  is  defined  as: 

SW  =  Z  Z  ' (Xj  -  mi  KXJ  “  mi  )r  ’  (A- 1  5) 

i= 1  j= 1 


where  n.  and  m;.  are  defined  as  the  number  of  training  vectors  and  the  mean  vector  for 

class  i,  respectively.  Note  that  the  number  of  elements  per  class  may  vary.  As  a  result,  the 
total  number  of  element  vectors  is  equal  to  P,  i.e., 

fdni=P.  (A.  16) 

i=l 

Our  derivation  is  based  on  the  following  three  matrix  rank  properties: 

1 .  The  rank  of  the  ( N  x  N)  dimensional  matrix  xxT ,  where  x  is  a  column  vec¬ 
tor  of  dimension  N,  is  equal  to  1  [Strang,  2003], 

2.  The  maximum  possible  rank  of  the  matrix  S,  defined  as: 

S  =  fJxlxiT,  (A.  17) 

7=1 

where  {x(.}  are  a  set  of  column  vectors,  is  equal  to  P. 

3.  The  A-dimensional  covariance  matrix  defined  from  the  set  of  element 
vectors,  {x(.}  p,  and  given  by: 

P 

Cov(X)  =  ^  (x;  -  m)(xt  -  m)T ,  (A.  1 8) 

/=! 


where  the  mean  column  vector  m  is  defined  as: 


1 


m  =  — 


P 


z 


x. 


(A.  19) 


has  possible  maximum  rank  equal  to  P  - 1,  due  to  the  fact  that  one  degree  of  freedom  is 
lost  by  the  subtraction  of  the  mean  element  vector  in  the  covariance  matrix  definition. 

Note  that  the  WCS  matrix  defined  in  Eq.  (A.  15)  can  be  viewed  as  the 
summation  of  C  class-specific  covariance  matrices,  as  defined  in  Eq.  (A.  18).  Using  prop¬ 
erties  2  and  3  listed  above,  each  ith  class-specific  covariance  matrix  can  be  shown  to  have 
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a  possible  maximum  rank  equal  to  (ni  - 1).  As  a  result,  the  overall  WCS  matrix,  derived 

from  the  summation  of  these  C  class-specific  covariance  matrices  has  a  maximum  rank 
equal  to  P-C,  because: 

f>,-l)  =  P-C.  (A.20) 

1=1 

Note  that  the  matrix  Sw  is  A-dimensional.  Thus,  it  is  nonsingular  only 
when  ( P-C)>  N,  [Strang,  2003].  Further,  recall  that  P  and  N  represent  the  number  of 
images  and  the  dimension  of  the  reshaped  images  as  column  vectors,  respectively.  Usu¬ 
ally  the  number  of  images  is  smaller  than  the  image  dimensions,  due  to  limitations  in  im¬ 
age  collection,  resulting  in  a  singular  Sw  matrix. 

b.  Maximum  Rank  of  the  Between-Class-Scatter  Matrix 
Next,  we  show  that  the  maximum  rank  of  the  Between-Class-Scatter- 
Matrix  (BCS)  SB  is  equal  to  C-l. 

Recall  that  the  N-dimensional  BCS  matrix  SB  is  defined  as: 

c 

Sb=Yj  ni  ( mi  -  ~mY  ’  (A.21) 

i=l 

where  m  is  the  mean  of  all  element  vectors,  ni  is  the  number  of  element-vectors,  and  m; 
is  the  mean  vector  of  class  i,  respectively. 

Note  that  the  matrix  SB  is  of  the  same  type  as  the  covariance  matrix  ex¬ 
pression  defined  in  Eq.  (A.  18).  Thus,  it  has  a  maximum  possible  rank  equal  to(C-l), 

due  to  the  fact  that  one  degree  of  freedom  is  lost  by  the  subtraction  of  the  mean  element 
vector  in  the  covariance  matrix  definition. 
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APPENDIX  B.  MATLAB  SOURCE  CODE 


This  appendix  lists  all  Matlab  programs  used  in  the  study.  These  programs  have 
been  developed  or  borrowed  from  other  references  and  modified  according  to  the  re¬ 
quirements  of  the  current  study.  The  Matlab  programs-functions  used,  but  not  developed 
or  modified  within  the  current  study,  are  only  referenced  without  their  Matlab  code  list¬ 
ing  being  provided. 

Below  are  listed  the  five  categories  of  experiments  being  conducted  in  the  subject 
study.  The  first  program  in  each  category  is  the  one  to  be  executed,  whereas  the  follow¬ 
ing  programs/functions  are  being  called  by  it. 

A.  DESCRIPTION 

1.  PCA-LDA  Comparison 

•  PCAvsLDA.m :  Creates  two  2-dimensional  classes,  applies  PCA  and  LDA 
projection  on  them  and  compares  the  two  projections. 

•  pca.m :  Applies  the  PCA  method  on  the  data,  [Lee,  2004], 

•  fld.m :  Applies  the  LDA  method  on  the  data,  [Lee,  2004], 

•  sortem.m :  Sorts  eigenvectors  by  decreasing  eigenvalue  magnitude  [Lee, 
2004], 

2.  Distance  Selection  -3-Dimensional  Plots  Creation 

•  DistanceSelection.m :  Applies  the  five  distances  analyzed  in  Chapter 
IV.A.2,  to  the  training/testing  images  and  the  classes’  centroids  to  produce 
the  respective  3-dimensional  and  associated  contour  plots.  Additionally, 
creates  the  confidence  score  3-dimensional  maps  when  selecting  the  Ma- 
halanobis  Angular  distance. 

•  dataST.m:  Normalizes  input  data  by  setting  mean  to  0  and  standard  devia¬ 
tion  to  1.  (From  [Baudat,  October  2000].). 

•  buildGDA  Opt.m:  Applies  the  GDA  algorithm  to  the  input  data  and  cre¬ 
ates  the  GDA  data,  which  will  be  used  for  the  input  data  projection  (from 
[Baudat,  October  2000]).  This  function  is  modified  as  commented,  in  or¬ 
der  to  vectorize  the  steps  involved  in  the  computations  of  the  GDA  projec¬ 
tion  vectors. 

•  eigensystem.m :  Performs  eigenvalue-eigenvector  decomposition  of  the  in¬ 
put  matrix  and  sorts  the  eigenvectors  by  decreasing  eigenvalue  magnitude 
(From  [Baudat,  October  2000].). 
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•  spreadGDA_Opt.m :  Projects  the  input  data  onto  the  GDA  eigenvectors 
(from  [Baudat,  October  2000]).  This  function  is  modified  to  vectorize  the 
GDA  projection  step  for  higher  computational  efficiency. 

•  KernelFunctiont.m :  It  is  called  only  when  the  Gaussian  kernel  function  is 
selected  by  setting  the  parameter  “degree”  equal  to  0  (From  [Baudat,  Oc¬ 
tober  2000].). 

3.  GDA  Classification  Performance  Measurement 

•  ClassificationPerformance.m :  Computes  and  saves  the  rank-(l-50)  scores 
and  the  classification  performance  per  class. 

•  dataST.m :  As  above. 

•  buildGDAOpt.m:  As  above. 

•  eigensystem.m:  As  above. 

•  spreadGDAOpt.m:  As  above. 

•  KernelFunctiont.m:  As  above. 

4.  Rank  Score  1  Mean  Confidence  Interval  Creation 

•  Confidencelnterval.m :  Derives  and  plots  the  95%  confidence  intervals  ob¬ 
tained  for  rank-1  to  -5  scores  for  average  classification  performances. 

5.  LDA  versus  Fisherface  Comparison,  Using  GDA  with  Polynomial 
Kernel  Function  of  Degree  1  as  LDA 

•  PCAGDAl.m :  Computes  classification  performances  for  the  LDA  method, 
(implemented  using  the  GDA  with  polynomial  kernel  function  of  degree 
1),  and  for  the  Fisherface  method  (implemented  using  the  PCA  and  GDA 
with  polynomial  kernel  of  degree  1). 

•  pca.m :  As  above. 

•  sortem.m :  As  above. 

•  buildGDA  Opt.m:  As  above. 

•  eigensystem.m:  As  above. 

•  dataST.m :  As  above. 

•  spreadGDA  Opt.m:  As  above. 


B.  MATLAB  CODE  LISTING 
•  PCAvsLDA.m: 

%%  Filename:  PCAvsLDA.m 

%%  Thesis  Advisor:  Prof.M.P.Fargues  Naval  Postgraduate  School 
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%%  Author:  Captain  Dimitrios  I.  Domboulas 

%%  Hellenic  Air  Force 


%%  Date: 

%%  Description: 

%% 

%% 

%% 


May  2004 

1.  Creates  random  two  classes  of  2-dimensional  input  vectors 
and  apply  PCA  and  LDA. 

2.  Plots  the  projections' 

of  the  two  schemes  and  compares  their  separability. 


%%  Input: 
%%  Outputs: 


N/A 

2-Dimensional  projection  plot 


%%  Functions  used:  pca.m,  fld.m,  sortem.m 


clc; 

clear; 

ml=[2;3];  %%  mean  for  class  1 
sxl=2.8;  %%  standard  deviation  class  1 
sy  1=0.4; 

m2=[2;4];  %%  mean  for  class  2 
sx2=2.7;  %%  standard  deviation  class  2 
sy2=0.4; 

%  create  2-D  data  matrices 
for  k=l:5; 

xl(:,k)=ml+[sxl;syl].*rand(2,l);  %  create  class  1 
x2(:,k)=m2+[sx2;sy2].*rand(2,l);  %  create  class  2 
end 


%%  use  of  PCA 
xl2=[xl,x2]; 

[W 1  ,m  1 ,  Amean  1  ,E  VA 1  ]=pca(x  1 2,k); 

%%  center  each  class==>subtract  the  mean  vector  of  each  class 
[m,n]=size(xl); 

mxl=xl-mean(xl,2)*ones(l,n); 

mx2=x2-mean(x2,2)*ones(l,n); 

%%  1  D  Projection 
pxll=Wl(:,l)'*xl; 
pxl2=Wl(:,l)'*x2; 

figure 

whitebg('w'); 

plot([-10,10]*Wl(l,l),[-10,10]*Wl(2,l));grid  on;hold  on; 
scatter(  xl(l,:) ,  xl(2,:) );  grid  on; 

%axis([-l  0,1 0,-10,10]); 
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axis([-2,6,-4,6]); 
hold  on; 

scatter(  x2(l,:) ,  x2(2,:),  ); 

hold  on; 

%  plot  the  1-D  projections 
scatter(pxl  l*Wl(l,l),pxl  1*W1(2,1)); 
scatter(pxl2*Wl(l,l),pxl2*Wl(2,l),'*'); 

%%  use  of  LDA 

C=[  1,1, 1,1, 1,2, 2, 2, 2, 2]; 

[W2,D]=fld(xl2,C); 

%%  find  projected  data  (LDA  eigenspace) 

px21=W2'*xl; 

px22=W2'*x2; 


%  plot  the  LDA  projection  line 

plot([-l  0, 1 0]  *  W2(  1),[— 10,10]  *  W2(2),'g — ');grid  on;hold  on; 
hold  on; 

%  plot  the  1-D  projections 
scatter(px2 1  * W2(  1  ),px2 1  * W2(2),'g'); 
scatter(px22*W2(l),px22*W2(2),'g*'); 
legend('PCA','LDA' ); 

title('PCA  vs  LDA  Projections');xlabel('x');ylabel(y); 

•  DistanceSelection.m: 


%% 


%%  Filename: 

%%  Thesis  Advisor: 
%%  Author: 

%% 

%%  Date: 

%%  Description: 

%% 

%% 

%% 

%%  Parameters: 

%% 

%% 

%% 


%% 

%%  Outputs: 

%% 

%%  Functions  used: 


%% 


DistanceSelection.m 

Prof.M.P.Fargues  Naval  Postgraduate  School 
Captain  Dimitrios  I.  Domboulas 
Hellenic  Air  Force 
July  2004 

Produces  3-dimensional  distance  map  plots  and  associated  con¬ 
tours  for  various  types  of  distances  between 
training/testing  images  and  class  centroids. 

Also  produces  the  3-dimensional  confidence  score  plots/contours. 

1 .  Kernel  function  type  (Gaussian,  Polynomial) 

(degree  of  polynomial  kernel) 

2.  Number  of  K  matrix  eigenvectors  kept 

3.  Distance  type  to  be  used 

4.  Number  of  iterations 

1 .  Respective  3-dimensional  plots 

2.  Respective  contours. 

buildGDAOpt.m,  eigensystem.m,  dataST.m, 
spreadGDA  Opt.m,  KemelFunction.m 
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clc; 

clear; 

load  Aall; 

clear  img  A  Db  Dme  x  ans  img  name  N_class  j  k  m  n  s  si  sqrsl  teml  tem2  time; 
clear  IndSamples  N  objects  Section  imnuml  im_num2  Person  T  v  w; 

Atemp  =  double(A_all); 
clear  A_all; 

%%  CONSTANTS  '  DECLARATION 

Train_Class_Size  =  18;  %%  Number  of  Training  Images  per  Class 
Test_Class_Size  =  12;  %%  Number  of  Testing  Images  per  Class 
Num_Train_Imag  =  900;  %%  Total  Number  of  Training  Images 
Num_Test_Imag  =  600;  %%  Total  Number  of  Testing  Images 
Num_Class  =  50;  %%  Total  Number  of  Classes 
cs  =  [18,18,18,18,18,18,18,18,18,18]; 

Class_Sizes  =  [cs,  cs,  cs,  cs,  cs];  %%(1  X  50)  vector,  with  class  sizes  per  class 

degree  =  2;  %%%  select  Polynomial  kernel  degree,  USE  0  ==>  for  GAUSSIAN  kernel 

ONLY 

Num_ev  =  0;  %%%  input  Number  of  K  matrix  eigenvectors  (0  =>  Default) 
select_dist  =  2;  %%%  select  distance 

Numlter  =  1 ;  %%%  Number  of  MAIN  LOOP  ITERATIONS 

for  loop  =  1  :  Num  lter;  %%%%%  MAIN  ITERATION  LOOP  %%%%%%%%% 
tic  %%  start  timing  loops 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%%%%  Random  Permutation  Section  Borrowed  from  [Lee,  2004]  and  modified  by 
%%%%  [DOMBOULAS,  2004] 

A  =  Atemp;  %%  A  contains  the  Training  Images  (2700x900) 

B  =  [];  %%  B  contains  the  testing  images  (2700x600) 

Delete  =  []; 

h  =  waitbar(0,  'Random  Permutation');  %%%%  WAITBAR 
for  i=l  :  150  %  i  goes  from  1  to  the  total  number  of  pictures 
waitbar(  i/150  );  %%%%  WAITBAR 

rp  =  randperm(lO); 

LI  =  (i-1)  *10+1; 

L2  =  i*10; 

A(  :  ,  LI  :  L2  )  =  A(  : ,  (i-1)  *  10  +  rp  );  %%  permute  images 
IndSamples  =  [LI  Ll+1  Ll+2  Ll+3]; 

B  =  [B  A( : ,  IndSamples  )]; 

Delete  =  [Delete  IndSamples]; 
end 

close(h);  %%%%  WAITBAR 

A( : ,  Delete )  =  []; 
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%%%%  Section  Borrowed  from  [Lee,  2004]  and  modified  by  [Domboulas,  2004] 

%%%%%%%0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o%%% 

A  =  DataSt(  A );  %%  Mean  =0,  Standard  Deviation  =  1 
B  =  DataSt(  B  );  %%  Mean  =0,  Standard  Deviation  =  1 
Le  =  A';  %%%  Learning  Data  (900  Images  are  rows) 

Te  =  B';  %%  600  Testing  images  of  our  Data  Base  (Images  are  rows) 
[dataGDA,centeredkM,kM]  =  buildGDA_Opt(  Le  ,  Class  Sizes  ,  degree,  Num_ev ); 
%%run  GDA 

Pgda  A  =  spreadGDA_Opt(  Le,  Le,  dataGDA,  degree );  %%  Projected  Training  images 
in  GDA 

Pgda  B  =  spreadGDA_Opt(  Te,  Le,  dataGDA,  degree  );  %%  Projected  Testing  images  in 
GDA 

%%%  FIND  VARIANCE  of  each  column  of  the  900X49  Projected  Train  Data  Matrix. 
%%%  Covariance  is  used  for  other  Image  Distance  types 
for  k  =  1  :  size(  Pgda  A  ,2); 

Pr_Tr_Dat_Cov_Col(k)  =  cov(  Pgda_A(:,k) );  %%  Row  Vector  1X49  with  variances 
%%%  of  each  of  the  49  columns  of  the  Projected  Training  Data  Matrix  900X49 
end; 

Pr_Tr_Dat_Cov_Mat  =  cov(Pgda  A);  %%  49X49  covariance  matrix  of  Proj  Train  Data 
Matrix 

%%%%%%  It  is  Diagonal!! Invertible  &  Each  Dimension 's  features  are 
%%%%%%  independent  from  each  othert  ==>  It  is  the  Ideal  case 
Inv_Cov  =  inv(  Pr_Tr_Dat_Cov_Mat );  %%%  Inverse  Covar  Matr 
%%%  For  the  Approximated  Mahalanobis  Distance 
for  k  =  1  :  size(  Inv_Cov,2  ); 

s_inv(k)  =  Inv_Cov(k,k);  %%  ROW  VECTOR  1  x  49 
end; 

sr_s_inv  =  sqrt(s_inv(k));  %%  Row  Vector 

sr_inv_diag  =  diag(  sr_s_inv );  %%Diagonal  Matrix  with  diag  elem  the  sq  roots  of 
%%%  the  inverse  diag  elem  of  cov  matrix  49  x  49 

%%%%%%%%%%%%  END  VARIANCE  SECTION  %%%%%%%%%%%% 

%%  Find  Centroids  per  Each  Class  Projected  Training  Images 
for  k=  1  :  NumClass 

ProjTrainCentr(  k  , : )  =  mean(  Pgda_A(  ((k-l)*Train_Class_Size  +  1) :  ( k  * 

TrainClassSize ),:)); 

end; 

%%%  create  cell  array 

DistName  =  {'Norm-2  ';'Mahalanobis  '; 

'Mahalanobis  Norm-2  ';'Mahalanobis  Norm-1  ';'Mahalanobis  Angular'}; 

%%  Find  Various  Distances  of  Each  Training  Image  from  all  the  Classes  '  Centroids, 
%%  (TrainDist  900  X  50) 
for  k  =  1  :  Num  Train  lmag  ; 
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for  m  =  1 :  Num_Class; 

if  selectdist  ==  1 ;  %%  NORM-2  Distance 

TrainDist(k,m)  =  norm(  Pgda_A(k,:)  -  ProjTrainCentr(m,:),2  ); 

elseif  select_dist  ==  2;  %%  Mahalanobis  Distance 

TrainDist(k,m)  =  sqrt(  (Pgda_A(k,:)  -  ProjTrainCentr(m,:))  *  Inv_Cov  * 

(Pgda_A(k,:)  -  ProjTrainCentr(m,:))' ); 

elseif  select  dist  ==  3;  %%  Mahalanobis  NORM-2  Distance 
temp  =  (  sr_s_inv  .*  (  Pgda_A(k,:)  -  ProjTrainCentr(m,:)  )  )  *  (  sr_s_inv  .*  ( 
Pgda_A(k,:)  -  ProjTrainCentr(m,:) ) )' ; 

TrainDist(k,m)  =  sqrt(  sum(  temp  )  );  %%  Approxim  Mahalan  Dist 

elseif  select_dist  ==  4;  %%  Mahalanobis  NORM-1  Distance 
TrainDist(k,m)=sr_s_inv  .*  norm(  Pgda_A(k,:)  -  ProjTrainCentr(m,:),l  ); 

else  %%%%  MAHALANOBIS  ANGULAR  DISTANCE 
TrainDist(k,m)  =  acos(  (  (  Pgda_A(k,:)  *  sr_inv_diag  )  *  (  sr_inv_diag  *  ProjTrain- 
Centr(m,:)' ) )  /  ( norm(  Pgda_A(k,:).*sr_s_inv  ,2  )  *  norm(  ProjTrain- 
Centr(m,:).*sr_s_inv  ,2))); 

end;  %%%  end  if 
end; 
end; 

%%%  Plot  in  3-D  the  900  Projected  Training  Images  1  Distances  from  the 
%%%  Classes  ’  Centroids 

for  n=l:Num_Train_Imag  %%  Create  Training  Images  1  Indeces  for  the  “mesh”  function 
Pgda_AIndex(n)=n; 
end; 

for  d=l:Num_Class  %%  Create  Classes  '  Indeces  for  the  “mesh”  function 
Class_Index(d)  =  d; 
end; 
figure; 

mesh(Class_Index,Pgda_AIndex,TrainDist); 

xlabel('Class  Index');ylabel('Training  Image  Index');zlabel('Distance'); 
title(  DistName(select_dist) );  %%%  Num_Eigv(select_file) 
figure; 

mesh(Class_Index,Pgda_AIndex,TrainDist); 

xlabel('Class  Index');ylabel('Training  Image  Index');zlabel('Distance'); 
title(  DistName(select_dist) );  %%%  Num_Eigv(select_file) 
view(0,90);  %%  view  3-D  plot  from  top 
axis([  1,50, 1,900]); 

%%%  Find  Various  Distances  for  Testing  Images 
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for  k  =  1  :  Num_Test_Imag  ; 
for  m  =  1 :  Num_Class; 

if  select_dist  ==  1;  %%  NORM-2  Distance 

TestDist(k,m)  =  norm(  Pgda_B(k,:)  -  ProjTrainCentr(m,:),2  ); 

elseif  selectdist  ==  2;  %%  MAHALANOBIS  DISTANCE 

TestDist(k,m)  =  sqrt(  (Pgda_B(k,:)  -  ProjTrainCentr(m,:))  *  Inv_Cov  *  (Pgda_B(k,:)  - 
ProjTrainCentr(m,:))' );  %%  Mahalanobis  Distance 

elseif  select  dist  ==  3;  %%  Mahalanobis  NORM-2  Distance 
temp  =  (  sr_s_inv  .*  (  Pgda_B(k,:)  -  ProjTrainCentr(m,:) ) )  *  (  sr_s_inv  .*  ( 
Pgda_B(k,:)  -  ProjTrainCentr(m,:)  )  )' ; 

TestDist(k,m)  =  sqrt(  sum(  temp  ) );  %%  Approxim  Mahalan  Dist 

elseif  select  dist  ==  4;  %%  Mahalanobis  NORM-1  Distance 
TestDist(k,m)=sr_s_inv  .*  norm(  Pgda_B(k,:)  -  ProjTrainCentr(m,:),l  ); 

else  %%%%  MAHALANOBIS  ANGULAR  DISTANCE 

TestDist(k,m)  =  acos(  ( (  Pgda_B(k,:)  *  sr_inv_diag )  *  (  sr_inv_diag  *  ProjTrain- 
Centr(m,:)'  )  )  /  (  norm(  Pgda_B(k,:).*sr_s_inv  ,2  )  *  norm(  ProjTrain- 
Centr(m,:).*sr_s_inv  ,2))); 

end;  %%%  end  if 

end 

end 

%%%  Plot  in  3-D  the  600  Projected  Testing  Images  '  Distances  from  the 
%%%  Classes '  Centroids 

for  n=l:Num_Test_Imag  %%Create  Testing  Images'  Indeces  for  “mesh”  function 
Pgda_BIndex(n)=n; 
end; 

for  d=l:Num_Class  %%  Create  Classes  '  Indeces  for  the  “mesh”  function 
Classlndex(d)  =  d; 
end; 
figure; 

mesh(Class_Index,Pgda_B  Index, TestDist); 

xlabel('Class  Index');ylabel('Testing  Image  Index');zlabel('Distance'); 
title(  DistName(select_dist) );  %%%  Num_Eigv(select_file) 
figure; 

mesh(Class_Index,Pgda_B  Index, TestDist); 

xlabel('Class  Index');ylabel('Testing  Image  Index');zlabel('Distance'); 
title(  DistName(select_dist) );  %%%  Num_Eigv(select_file) 
view(0,90);  %%  view  3-D  plot  from  top 
axis([  1,50, 1,600]); 
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%%%  Assign  1-50  Class  Indeces  for  each  of  the  600  test  Images 
T_B  =  [1,1, 1,1, 1,1, 1,1, 1,1, 1,1]; 
for  m  =  1  :  NumClass; 

T_B(  (  TestClassSize  *  (m-1)  +  1  ) :  (  Test_Class_Size  *  m )  )  =  m  *  T_B( 
l:Test_Class_Size) ; 

end;  %%%  T_B  contains  the  class  number  of  each  testing  image  (1X600) 

%%%  Sort  Distances  per  Testing  Image  -  RANK 
for  k  =  1  :  Num_Test_Imag  ; 
tempi  =  sort(  TestDist(k,:) ); 

I(k)  =  find(  (tempi  ==  TestDist(k,T_B(k))*ones(l,Num_Class)) ); 
end; 

%%%  Performance  Measurement  Per  Class 
for  k  =  1  :  Num_Class; 

temp2  =  find(  I(  ((k-l)*Test_Class_Size  +  1) :  ( k  *  Test_Class_Size  ) )  ==  ones( 
l,Test_Class_Size ) ); 

Perf_Class(k)  =  size(  temp2 ,2)1  Test_Class_Size; 
end; 
figure; 

stem(Perf_Class);grid  on; 
title('Classification  Performance  per  Class'); 
xlabel('Class  Index');ylabel('Classification  Performance'); 

%%%  Find  the  probability  of  each  of  the  50  Ranks  assigned  to  all  the  testing  images 
for  k  =1  :  Num  Class  ; 

temp3  =  find(  I  ==  k  *  ones(  l,Num_Test_Imag  ) ); 

Prob(k)  =  size( temp3 ,2)1  Num_Test_Imag; 
end; 

%%%%  RANK  SCORE  Evaluation  -  Cummulative  Probability  of  Ranks  -  Incresing 
Order 

Rank(l)  =  Prob(l); 
for  k  =  2  :  Num_Class  ; 

Rank(  k )  =  Rank(  k-1 )  +  Prob(  k );  %%Rank  is  1X50  with  accumulated  Probability 
end; 
figure; 

stem(Rank);  grid  on;  axis([0,52,0.8,l.l]); 
title('Cummulative  Rank  Score  for  600  Testing  Images'); 
xlabel('Rank  Index');ylabel('Cumulative  Rank  Score'); 

if  select  dist  ==  5;  %%%  Confidence  Score  ONLY  for  Mahal  Angular  distance 
%%%  Find  Confidence  Score  for  each  Test  Image.  This  will  be  1X50  vector  per 
%%%  image,  which  will  assign  the  confidence  assignment  of  the  test  image  to  each  class. 
mm=2;  %%%  parameter  for  confidense  calculation 
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for  j  =  1  :  Num_Test_Imag  ; 
for  k  =  1  :  Num  Class  ; 

%%  d  has  the  distances  of  proj  test  imag  j  from  all  the  proj  class  centroids 
%  %  d(k,j)  =  norm(  Pgda_B(j,:)  -  ProjTrainCentr(k,:)  ,  2  );  %%%%  ONLY  for 
NORM-2  distance 

%%%%  ONLY  for  Mahalanobis  Angular  Distance 

d(j,k)  =  acos(  ( (  Pgda_B(j,:)  *  sr_inv_diag )  *  (  sr_inv_diag  *  ProjTrainCentr(k,:)' ) ) 
/  ( norm(  Pgda_B(j,:).*sr_s_inv  ,2  )  *  norm(  ProjTrainCentr(k,:).*sr_s_inv  ,2  ) ) ); 
end 

dd=d';  %%%  ONLY  For  Mahalanobis  Angular 
for  i  =  1  :  Num_Class  ; 

%%  u  has  the  confidence  of  assigning  test  imag  j  to  class  k 

u(i,j)  =  1  /  sum(  (  dd(i,j)  *  ones(Num_Class,l)  ./  dd(:,j)  )  .A(2/(mm-l))  );  %%%% 
NORM-2  CASE 
end 
end 

uu=u';  %%%  ONLY  For  Mahalanobis  Angular 

%%%  Plot  in  3-D  the  600  Projected  Testing  Images  '  Distances  from  the 
%%%  Classes '  Centroids 

for  n  =  1  :  NumTestlmag;  %%Create  Testing  Images'  Indeces  for  the  “mesh”  function 
PgdaBIndex(n)  =  n; 
end; 

for  d  =  1  :  Num_Class;  %%  Create  Classes  '  Indeces  for  the  “mesh”  function 
Classlndex(d)  =  d; 
end; 
figure; 

mesh(Class_Index,Pgda_BIndex,uu); 

xlabel('Class  Index');ylabel('Testing  Image  Index');zlabel('Score'); 
title(  'Testing  Images  Confidence  Score' );  %%%  Num_Eigv(select_file) 
axis([  1,50,1 ,600,0, 1  ]); 
figure; 

mesh(Class_Index,Pgda_BIndex,uu); 

xlabel('Class  Index');ylabel('Testing  Image  Index');zlabel('Score'); 

title(  'Testing  Images  Confidence  Score' );  %%%  Num_Eigv(select_file) 

axis([  1,50, 1,600, 0,1]); 

view(0,90);  %%  view  3-D  plot  from  top 

end;  %%%  end  if 

Rank_718(loop,:)  =  Rank; 

Perf_Class_718(loop,:)  =  Perf_Class; 

clear  A  TA  B  TB  f  LI  L2  Le  Te  dataGDA  centeredkM  kM; 
clear  Pgda  A  Pgda  B  k  j  Pr_Tr_Dat_Cov_Col  Pr_Tr_Dat_Cov_Mat; 
clear  TrainDist  TestDisttemp  Pgda  Aindex  Class  lndex; 
clear  Inv_Cov  s  inv  sr_inv_diag  ProjTrainCentr  DistName; 
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clear  Pgda  B Index  Classlndex  T_B  tempi  I  temp2  temp3; 
clear  Prob  Delete  d  diml  dim2  h  i  m  n  sr  s  inv; 
clear  Rank  Perf  Class  Pgda  AIndex  TestDist  cs  rp; 

time(loop)  =  toe;  %%  Stop  timing  each  Main  Iteration  Loop 

end;  %%%%%%  END  OF  MAIN  ITERATION  LOOP  %%%%%%%%%%%%% 

•  buildGDAOptm: 


%%  Filename:  buildGDA_Opt.m 

%%  Thesis  Advisor:  Prof.M.P.Fargues  Naval  Postgraduate  School 

%%  Author:  G.  Baudat,  F.  Anouar  (“buildGDA.m”-2 1  October  2000) 

%%  Modified  by:  Captain  Dimitrios  I.  Domboulas 
%%  Hellenic  Air  Force 


Date  modified:  July  2004 


Description: 

Inputs: 


%% 

%% 

%% 

%% 

%% 

%% 

%% 

%%  Outputs: 

%% 

%% 

%%  Functions  used:  eigensystem.m,  KemelFunction.m 

%% 


Creates  the  GDA  data  from  the  input  training 
data. 

1.  Normalized  training  data  matrix  (m=0,stdev=l) 

2.  Vector  with  number  of  element  vectors  per  class 

3.  Kernel  Function  type  and  degree 

4.  Number  of  Kernel  matrix  K  eigenvectors  kept 

1 .  GDA  data 

2.  Centered  K  matrix 

3.  Uncentered  K  matrix 


%%%  Modified  041115 

function  [dataGD  A, centeredkM,kM]=BuildGDA_Opt(L,S, degree, Numev) 

n=length(S); 

m=length(L(:,l)); 

mM=ones(m)/m; 

%%%  Create  kernel  matrix  (uncentered) 

if  degree  ==  0;  %%%  ADDED  by  D.  Domboulas 

%%%%  Added  waitbars  by  D.  Domboulas  for  code  optimization 

kM=zeros(m); 

h=waitbar(0, 'Create  Matrix  K');  %%%%%%  WAITBAR 

for  i=l:m 

waitbar(i/m,h);  %%%%%%  WAITBAR 

for  j=l:m 

kM(i,j)=KemelFunction(L(i,:),L(j,:)); 

end 

end 
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close(h); 


%%%%%%  WAITBAR 


%%%  ADDED  by  D.DOMBOULAS  FOR  POLYNOMIAL  KERNEL  FUNCTIONS 

ONLY 

else 

kM  =  (  L  *  L'  ).Adegree;  %%%Matrix  Vectorization 
end;  %%%  end  if 

%%%  ADDED  by  D.DOMBOULAS  FOR  POLYNOMIAL  KERNEL  FUNCTIONS 
ONLY 

sum K  =  inM  *  kM; 

%  center  the  kernel  matrix 
centeredkM=kM-sumK'-sumK+sumK*mM; 
clear  mM; 

%  decomposition  of  the  centered  kernel  for  beta  eigenvectors 
[vecCkM,valCkM]=eigensystem(centeredkM); 

%%%%%%%%%%%  Plot  K  matrix  Eigenvalues  ==>  ADDED  by  D.DOMBOULAS 
for  k=l  :size(valCkM) 

V  alK(k)=valCkM(k,k) ; 
end 

ValK(l:50); 

figure;plot(ValK);grid  on; 

xlabel('Eigenvalue  Index');ylabel('Eigenvalue  Magnitude'); 
title('GDA  K  Matrix  Eigenvalues'); 

%%%%%%%%%%  Plot  K  matrix  Eigenvalues  ==>  ADDED  by  D.DOMBOULAS 
min Val=valCkM(  1,1)/ 1000;  %define  the  lowest  eigen  value  used 
if  Num_ev  ==  0 

%  %  rankValCkM=250;  %%%  Test  Reliability  of  Results 
rankValCkM=m;  %%%  Default  numbre  of  eigvectors  kept. 

%%%%%%  ADDED  by  D.  Domboulas 
else 

rankValCkM  =  Num  ev;  %%%  keep  this  number  of  eigvectors 
end 

%%%%%%%  ADDED  by  D.  Domboulas 

h=waitbar(0, ’Center  K  Matrix  with  Highest  Eigvalues’);  %%%%  WAITBAR 
for  i=l:m 

waitbar(i/m,h);  %%%%  WAITBAR 

if  valCkM(i,i)<minVal; 
valCkM(i,i)=0; 
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rankV  alCkM=rankV  alCkM- 1 ; 
end 
end 

close(h);  %%%%  WAITBAR 

valCkM=valCkM(  1  :rankValCkM,l  :rankValCkM); 
vecCkM=vecCkM(:,  1  :rankValCkM); 

%%%centered  K  matrix  with  only  the  highest  eigen  values/vectors 
centeredkM=vecCkM*valCkM*vecCkM'; 

%w  matrix  of  the  weights,  bloc  diagonal 
wM=zeros(m); 
stBloc=l; 
endBloc=0; 

h=waitbar(0, 'Create  Matrix  W'); 
for  i=l:n 

waitbar(i/n,h); 
endBloc=endBloc+S(i); 
for  j=stBloc:endBloc 
for  k=stBloc:endBloc 
wM(j,k)=l/S(i); 
end 
end 

stBloc=stBloc+S(i); 
end 

close(h); 

%compute  alpha  normalized  vectors 
nb  Axes=min(  [rankV  alCkM] ,  [n- 1  ] ) ; 
[vecBeta,valBeta]=eigensystem(vecCkM'*wM*vecCkM); 
tmp=zeros(  1  ,nbAxes); 

h=waitbar(0, 'Create  TEMP  Matrix'); 
for  i=l:nbAxes  ; 
waitbar(i/nbAxes,h); 
tmp(i)=valBeta(i,i); 
end 

close(h); 

%  fprintf('Inertia:  %f\n',tmp); 
tmp=vecCkM*inv(valCkM)*vecBeta; 
norAlpha=zeros(m,nbAxes); 

h=waitbar(0, 'Create  norAlpha  matrix');  %%%%  WAITBAR 

for  i=l:nbAxes  ; 

waitbar(i/nbAxes,h);  %%%%  WAITBAR 


%%%%  WAITBAR 
%%%%  WAITBAR 

%%%%  WAITBAR 


%%%%  WAITBAR 
%%%%  WAITBAR 


%%%%  WAITBAR 
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norAlpha(:,i)=tmp(:,i)/sqrt(tmp(:,i)'*centeredkM*tmp(:,i)); 

end 

close(h);  %%%%  WAITBAR 

sumK=sumK(  1 , 

dataGDA=struct('norAlpha',norAlpha,'sumK',sumK); 

•  spreadGDAOptm: 


%% 


%%  Filename: 

%%  Thesis  Advisor: 
%%  Author: 

%%  Modified  by: 

%% 


spreadGDAOpt.m 

Prof.M.P.Fargues  Naval  Postgraduate  School 
G.  Baudat,  F.  Anouar  (“buildGDA.m”-21  October  2000) 
Captain  Dimitrios  I.  Domboulas 
Hellenic  Air  Force 


%%  Date  modified: 
%%  Description: 

%% 

%%  Inputs: 

%% 


%% 


%% 


July  2004 

Projects  the  input  GDA  data  on  to  Kernel  matrix 
K  eigenvectors. 

1.  Normalized  testing  data  (m=0,  stdev=l) 

to  be  projected  on  the  GDA  data  eigvectors 

2.  Normalized  training  data  (m=0,  stdev=l), 
which  is  used  to  build  the  GDA  data 


%% 

%% 

%%  Outputs: 


3.  GDA  data 

4.  Kernel  Function  type  and  degree 
Projected  data  on  the  Kernel  matr  K  eigvectors 


%%  Functions  used:  KernelFunction.m 


function  projectedVectors=SpreadGDA_Opt(T,L,dataGDA,  degree); 
n=length(T(:,l));  %size  of  the  test  set 
m=length(L(:,l));  %size  of  the  learning  set 
KernelEva=zeros(n,m); 

%%%  Waitbars  added  by  D.  Domboulas  for  code  optimization 
if  degree  ==  0;  %%%  ADDED  by  D.  Domboulas 
h=waitbar(0, 'Center  K  Matrix  for  SPREADGDA');  %%%%  WAITBAR 
for  i=l:n 

waitbar(i/n,h);  %%%%  WAITBAR 

tmp=zeros(l,m); 
for  j=l:m 

tmp(j)=KemelFunction(T(i,:),L(j,:)); 

end 

KemelEva(i,  :)=tmp-sum(tmp)/m; 
end 

close(h);  %%%%  WAITBAR 
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%%%%%%%%%%  ADDED  by  D.  DOMBOULAS  for  Matrix  Vectorization 

%%%%%%%%%%  ONLY  for  POLYNOMIAL  KERNELS 

else 

tempi  =  (T*L').Adegree;  %%%  Matrix  Vectorization 

KernelEva  =  tempi  -  mean( temp  1,2)  *  ones(l,m);  %%  Remove  Mean 

end;  %%%  end  if 

%%%%%%%%%%  ADDED  by  D.  DOMBOULAS  for  Matrix  Vectorization 
%%%%%%%%%%  ONLY  for  POLYNOMIAL  KERNELS 

inter=dataGD  A.  sumK-sum(dataGD  A.  sumK)/m; 

h=waitbar(0,'  PROJECTION  SPREADGDA');  %%%%  WAITBAR 

for  i  =  1  :  n; 

waitbar(i/n,h);  %%%%  WAITBAR 

KemelEva(i, :  )=KemelE  va(i, :  )-inter ; 
end 

projectedVectors=KemelEva*dataGDA.norAlpha; 
close(h);  %%%%  WAITBAR 

•  ClassificationPerformance.m: 


%% 

%%  Filename: 


ClassificationPerformance.m 


%%  Thesis  Advisor:  Prof.M.P.Fargues  Naval  Postgraduate  School 


%%  Author: 

%% 

%%  Date: 

%%  Description: 

%% 

%%  Parameters: 
%% 

%% 

%% 

%%  Outputs: 

%% 

%% 

%%  Functions  used: 

%% 

%% 


Captain  Dimitrios  I.  Domboulas 
Hellenic  Air  Force 
July  2004 

Computes  classification  performance  of  the  GDA 
method,  for  various  kernel  functions 

1.  Kernel  function  type  (Polyn  degree/Gaussian) 

2.  Number  of  K  matrix  eigenvectors  kept 

3.  Distance  type  to  be  used 

4.  Number  of  iterations 

1.  Rank-(l-50)  Scores  (in  “.mat”  file) 

2.  Classification  Performance  per  Class 
(in  “.mat”  file) 

buildGDAOpt.m,  eigensystem.m,  dataST.m, 
spreadGDAOpt.m,  KemelFunction. 


clc; 

clear; 

tic  %%  start  timing  1000  iterations 
load  Aall; 

clear  img  A  Db  Dme  x  ans  img  name  N_class  j  k  m  n  s  si  sqrsl  tern  1  tem2  time; 
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clear  IndSamples  N  objects  Section  im_numl  im_num2  Person  T  v  w; 

Atemp  =  double(Aall); 
clear  A_all; 

%%  CONSTANTS  '  DECLARATION 

Train_Class_Size  =  18;  %%  Number  of  Training  Images  per  Class 
Test_Class_Size  =  12;  %%  Number  of  Testing  Images  per  Class 
Num_Train_Imag  =  900;  %%  Total  Number  of  Training  Images 
Num_Test_Imag  =  600;  %%  Total  Number  of  Testing  Images 
Num_Class  =  50;  %%  Total  Number  of  Classes 
cs  =  [18,18,18,18,18,18,18,18,18,18]; 

Class_Sizes  =  [cs,  cs,  cs,  cs,  cs];  %%(1  X  50)  vector,  with  class  sizes  per  class 
degree  =  2;  %%  select  Polynom  kernel  degree,  USE  0  ==>  for  GAUSSIAN  ker  ONLY 
Num_ev  =  0;  %%%  input  Number  of  K  matrix  eigenvectors  (0  =>  Default) 
select_dist  =  2;  %%%  select  distance 

Numlter  =  1 ;  %%%  Number  of  MAIN  LOOP  ITERATIONS 
h=waitbar(0,'  MAIN  ITERATION  LOOP’);  %%%%  WAITBAR 
for  loop  =  1  :  Num  lter;  %%%%%  MAIN  ITERATION  LOOP 

%%%%%%%%%%%%%0/o0/o0/o%0/o0/o0/o0/o0/o%%0/o%%0/o% 

waitbar(loop/Num_Iter  ,  h);  %%%%  WAITBAR 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%%%%  Random  Permutation  Section  Borrowed  from  [Lee,  2004]  and  modified  by 
%%%%  [DOMBOULAS,  2004] 

A  =  Atemp;  %%  A  contains  the  Training  Images  (2700x900) 

B  =  [];  %%  B  contains  the  testing  images  (2700x600) 

Delete  =  []; 

%  %  h  =  waitbar(0,  'Random  Permutation');  %%%%  WAITBAR 
for  i=l  :  150  %  i  goes  from  1  to  the  total  number  of  pictures 
%%  waitbar(  i/150 );  %%%%  WAITBAR 

rp  =  randperm(lO); 

LI  =  (i-1)  *10+1; 

L2  =  i*10; 

A( : ,  LI  :  L2  )  =  A( : ,  (i-1)  *  10  +  rp );  %%  permute  images 
Index  =  [LI  Ll+1  Ll+2  Ll+3]; 

B  =  [B  A( : ,  Index )]; 

Delete  =  [Delete  Index]; 
end 

%%  close(h);  %%%%  WAITBAR 

A( : ,  Delete  )  =  []; 

%%%%  Section  Borrowed  from  [Lee,  2004]  and  modified  by  [Domboulas,  2004] 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
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A  =  DataSt(  A );  %%  Mean  =0,  Standard  Deviation  =  1 
B  =  DataSt(  B  );  %%  Mean  =0,  Standard  Deviation  =  1 
Le  =  A';  %%%  Learning  Data  (900  Images  are  rows) 

Te  =  B';  %%  600  Testing  images  of  our  Data  Base  (Images  are  rows) 
[dataGDA,centeredkM,kM]  =  buildGDA_Opt(  Le  ,  Class  Sizes  ,  degree,  Num  ev  ); 
%%run  GDA 

PgdaA  =  spreadGDA_Opt(  Le,  Le,  dataGDA,  degree  );  %%  Projected  Training  images 
in  GDA 

Pgda  B  =  spreadGDA_Opt(  Te,  Le,  dataGDA,  degree  );  %%  Projected  Testing  images  in 
GDA 

%%%  FIND  VARIANCE  of  each  column  of  the  900X49  Projected  Train  Data  Matrix. 
%%%  Covariance  is  used  for  other  Image  Distance  types 
for  k  =  1  :  size(  Pgda  A  ,2); 

Pr_Tr_Dat_Cov_Col(k)  =  cov(  Pgda_A(:,k) );  %%  Row  Vector  1X49  with  variances 
%%%  of  each  of  the  49  columns  of  the  Projected  Training  Data  Matrix  900X49 
end; 

Pr_Tr_Dat_Cov_Mat  =  cov(Pgda_A);  %%  49X49  covariance  matrix  of  Proj  Train  Data 
Matrix 

%%%%%%  It  is  Diagonal!! Invertible  &  Each  Dimension 's  features  are 
%%%%%%  independent  from  each  othert  ==>  It  is  the  Ideal  case 
Inv_Cov  =  inv(  Pr_Tr_Dat_Cov_Mat );  %%%  Inverse  Covar  Matr 
%%%  For  the  Approximated  Mahalanobis  Distance 
for  k  =  1  :  size(  Inv_Cov,2  ); 

s_inv(k)  =  Inv_Cov(k,k);  %%  ROW  VECTOR  1  x  49 
end; 

sr_s_inv  =  sqrt(s_inv(k));  %%  Row  Vector 

sr_inv_diag  =  diag(  sr_s_inv  );  %%Diagonal  Matrix  with  diag  elem  the  sq  roots  of 
%%%  the  inverse  diag  elem  of  cov  matrix  49  x  49 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%END  VARIANCE  SEC¬ 
TION 

%%  Find  Centroids  per  Each  Class  Projected  Training  Images 
for  k=  1  :  NumClass 

ProjTrainCentr(  k  , : )  =  mean(  Pgda_A(  ((k-l)*Train_Class_Size  +  1) :  ( k  * 

TrainClassSize ),:)); 

end; 

%%%  create  cell  array 

DistName  =  {'Norm-2  ';'Mahalanobis  '; 

'Mahalanobis  Norm-2  ';'Mahalanobis  Norm-1  ';'Mahalanobis  Angular'}; 

%%%  Find  Various  Distances  for  Testing  Images 
for  k  =  1  :  Num_Test_Imag  ; 
for  m  =  1 :  Num_Class; 
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if  select_dist  ==  1;  %%  NORM-2  Distance 

TestDist(k,m)  =  norm(  Pgda_B(k,:)  -  ProjTrainCentr(m,:),2  ); 

elseif  selectdist  ==  2;  %%  MAHALANOBIS  DISTANCE 

TestDist(k,m)  =  sqrt(  (Pgda_B(k,:)  -  ProjTrainCentr(m,:))  *  Inv_Cov  *  (Pgda_B(k,:)  - 
ProjTrainCentr(m,:))' );  %%  Mahalanobis  Distance 

elseif  select  dist  ==  3;  %%  Mahalanobis  NORM-2  Distance 
temp  =  (  sr_s_inv  .*  (  Pgda_B(k,:)  -  ProjTrainCentr(m,:)  )  )  *  (  sr_s_inv  .*  ( 
Pgda_B(k,:)  -  ProjTrainCentr(m,:) ) )' ; 

TestDist(k,m)  =  sqrt(  sum(  temp  )  );  %%  Approxim  Mahalan  Dist 

elseif  select  dist  ==  4;  %%  Mahalanobis  NORM-1  Distance 
TestDist(k,m)=sr_s_inv  .*  norm(  Pgda_B(k,:)  -  ProjTrainCentr(m,:),l ); 

else  %%%%  MAHALANOBIS  ANGULAR  DISTANCE 

TestDist(k,m)  =  acos(  ( (  Pgda_B(k,:)  *  sr  inv  diag  )  *  (  sr  inv  diag  *  ProjTrain- 
Centr(m,:)' ) )  /  ( norm(  Pgda_B(k,:).*sr_s_inv  ,2  )  *  norm(  ProjTrain- 
Centr(m,:).*sr_s_inv  ,2  )  )  ); 

end;  %%%  end  if 

end 

end 

%%%  Assign  1-50  Class  Indeces  for  each  of  the  600  test  Images 
T_B  =  [1,1, 1,1, 1,1, 1,1, 1,1, 1,1]; 
for  m  =  1  :  NumClass; 

T_B(  ( Test_Class_Size  *  (m-1)  +  1 ) :  (  Test_Class_Size  *  m )  )  =  m  *  T_B( 
l:Test_Class_Size) ; 

end;  %%%  T_B  contains  the  class  number  of  each  testing  image  (1X600) 

%%%  Sort  Distances  per  Testing  Image  -  RANK 
for  k  =  1  :  Num_Test_Imag  ; 
tempi  =  sort(  TestDist(k,:) ); 

I(k)  =  find(  (tempi  ==  TestDist(k,T_B(k))*ones(l,Num_Class)) ); 
end; 

%%%  Performance  Measurement  Per  Class 
for  k  =  1  :  Num  Class; 

temp2  =  fmd(  I(  ((k-l)*Test_Class_Size  +  1) :  ( k  *  Test_Class_Size ) )  ==  ones( 
l,Test_Class_Size ) ); 

Perf_Class(k)  =  size( temp2 ,2)1  Test_Class_Size; 
end; 

%%%  Lind  the  probability  of  each  of  the  50  Ranks  assigned  to  all  the  testing  images 
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for  k  =1  :  Num_Class  ; 

temp3  =  find(  I  ==  k  *  ones(  l,Num_Test_Imag  ) ); 

Prob(k)  =  size( temp3 ,2)1  Num  Test  lmag; 
end; 

%%%%  RANK  SCORE  Evaluation  -  Cummulative  Probability  of  Ranks  -  Incresing 
Order 

Rank(l)  =  Prob(l); 
for  k  =  2  :  Num_Class  ; 

Rank(  k )  =  Rank(  k-1  )  +  Prob(  k );  %%  Rank  is  1X50  with  accumulated  Probability 
end; 

Rank_718(loop,:)  =  Rank; 

Perf_Class_718(loop,:)  =  Perf_Class; 

clear  A  TA  B  TB  f  LI  L2  Le  Te  dataGDA  centeredkM  kM; 
clear  Pgda  A  PgdaB  k  j  Pr_Tr_Dat_Cov_Col  Pr_Tr_Dat_Cov_Mat; 
clear  TrainDist  TestDisttemp  Pgda  Aindex  Classlndex; 
clear  Inv_Cov  s  inv  sr_inv_diag  ProjTrainCentr  DistName; 
clear  PgdaB  Index  Classlndex  T_B  tempi  I  temp2  temp3; 
clear  Prob  Delete  d  diml  dim2  i  m  n  sr_s_inv  Index; 
clear  Rank  Perf  Class  Pgda  AIndex  TestDist  cs  rp; 

%  %  save  Mah_Ang_Rank_PerfClas_GDA_0_7_040926  Rank_718  Perf_Class_7 1 8  ; 
save  Mah_Ang_Rank_PerfClas_GDA_2_041116  Rank_718  Perf_Class_718  ; 

end;  %%%%%%  END  OF  MAIN  ITERATION  LOOP  %%%%%%%%%%%%%% 
close(h);  %%%%%%%%%%%  WAITBAR 

time(loop)  =  toe;  %%  Stop  timing  each  Main  Iteration  Loop 

•  ConfidencelntervaLm: 


%%  Filename: 

%%  Thesis  Advisor: 
%%  Author: 

%% 

%%  Date: 

%%  Description: 

%% 

%% 

%% 

%% 

%%  Input: 

%%  Outputs: 

%% 


ConfidencelntervaLm 

Prof.M.P.Fargues  Naval  Postgraduate  School 
Captain  Dimitrios  I.  Domboulas 
Hellenic  Air  Force 
August  2004 

Plots  the  Confidence  Interval  of  the 
rank-1  to  rank-5  score  means  of  5  different 
distances  used  for  classification. 

Also,  computes  the  mean  rank-1  score  for  all 
“.mat”  files  created  by  “ClassificationPerformance.m” 
“.mat”  files  created  by  “ClassificationPerformance.m” 

1 .  Confidence  Interval  plot 

2.  Rank-1  score  mean  for  each  file. 
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%%  Functions  used:  N/A 


clc; 

clear; 

%%%%%  Execute  GDA  Polynomial  Deg  2  %%%%%%%%%%%%%% 
load  Mahalanobis_Rank_040818; 

Mah  =  Rank_718; 
av_Mah  =  mean(  Mah(:,l) ) 

load  Norm2_Rank_0408 1 7 ; 

N2  =  Rank_718; 
av_N2  =  mean(  N2(:,l)) 

load  Mahalanobis_N2_Rank_040820; 

Mah_N2  =  Rank_718; 
av_Mah_N2  =  mean(  Mah_N2(:,l) ) 

load  Mahalanobis_N  l_Rank_04082 1 ; 

MahNl  =  Rank_718; 
av_Mah_N  1  =  mean(  Mah_Nl(:,l) ) 

load  Mahalanobis_Ang_Rank_040822 ; 

Mah_Ang_l  =  Rank_718; 
av_Mah_Ang_l  =  mean(  Mah_Ang_l(:,l) ) 

%%%  It  has  both  Rank  &  Perf  per  Class 
load  Mahalanobis_Ang_Rank_040826; 

Mah_Ang_2  =  Rank_718; 
av_Mah_Ang_2  =  mean(  Mah_Ang_2(:,l) ) 

%%%%%%%  GDA  Polynom  Deg  2 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%%%%%%%%%%  GDA  Polynom  Deg  1 
%%%%%%%%%%%%%%%%%%%%%%% 

%%%  Only  GDA  Pol  Deg  1=>  LDA 
%%%  It  has  both  Rank  &  Perf  per  Class 
load  Norm2_Rank_PerfClas_LDA_040827 ; 

N2GDA1  =  Rank_718; 
av_N2_GDAl  =  mean(  N2_GDA1(:,1) ) 

%%%  Only  GDA  Pol  Deg  1==>  LDA 
%%  It  has  only  Rank 

load  Mahalanobis_Ang_Rank _ LDA  040824; 

Mah_Ang_GD  A 1  =  Rank_718; 
av_Mah_Ang_GDAl  =  mean(  Mah_Ang_GDAl(:,l)  ) 
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%%%%  First  PCA  then  GDA  Pol  deg  1  <==>  PCA  +  LDA 
%%%  It  has  both  Rank  &  Perf  per  Class 
load  N2_Rank_PerfClas_PCAGDA  1  040829; 

N2PC  AGD  A 1  =  Rank_7 1 8 ; 
av  N2_PCAGDAl  =  mean(  N2_PCAGDA1(:,1) ) 

%%%%  First  PCA  then  GDA  Pol  deg  1  <==>  PCA  +  LDA 

%%%  It  has  both  Rank  &  Perf  per  Class 

load  Mah_Ang_Rank_PerfClas_PCAGDA  1  04090 1 ; 

MahAngPCAGDAl  =Rank_718; 

av_Mah_Ang_PC  AGD  A 1  =  mean(  MahAngPC  AGD  A 1  ( : ,  1 ) ) 

%%%%%%%%%%%  GDA  Polynom  Deg  1 
%%%%%%%%%%%%%%%%%%%%%%% 

%%%%%%%%%%%%%  GDA  Polynom  Deg  2  VARIOUS  NUMBER  EIGENVEC¬ 
TORS 

%%%%  GDA  Pol  deg  2  ==>  100  EigVec 

%%%  It  has  both  Rank  &  Perf  per  Class 

load  Mah_Ang_Rank_PerfClas_GDA2_100EigVec_04091 1; 

Mah_Ang_GDA2_l  00EV  =  Rank_718; 

av_Mah_Ang_GDA2_l 00EV  =  mean(  Mah_Ang_GDA2_100EV(:,l) ) 

%%%%  GDA  Pol  deg  2  ==>  150  EigVec 

%%%  It  has  both  Rank  &  Perf  per  Class 

load  Mah_AngRank_PerfClas_GDA_2_150EigVec_04 1 027; 

Mah_Ang_GD  A2_  1 5  0E  V  =  Rank_718; 

av_Mah_Ang_GD A2_  1 5 0E V  =  mean(  Mah_Ang_GDA2_150EV(:,l) ) 

%%%%  GDA  Pol  deg  2  ==>  200  EigVec 
%%%  It  has  both  Rank  &  Perf  per  Class 
load  Mah_Ang_Rank_PerfClas_GD  A_2_200EigV  ec_04 1026; 
Mah_Ang_GDA2_200EV  =  Rank_7 18; 

av_Mah_Ang_GDA2_200EV  =  mean(  Mah_Ang_GDA2_200EV(:,l) ) 

%%%%  GDA  Pol  deg  2  ==>  250  EigVec 
%%%  It  has  both  Rank  &  Perf  per  Class 
load  Mah_Ang_Rank_PerfClas_GDA_2_250EigVec_04 1025; 
Mah_Ang_GDA2_250EV  =  Rank_718; 

av_Mah_Ang_GD A2  250E V  =  mean(  Mah_Ang_GDA2_250EV(:,l) ) 

%%%%  GDA  Pol  deg  2  ==>  500  EigVec 

%%%  It  has  both  Rank  &  Perf  per  Class 

load  Mah_Ang_Rank_PerfClas_GD  A2_500EigVec_040908 ; 

Mah_Ang_GD  A2_5  00E  V  =  Rank_7 18; 

av_Mah_Ang_GD A2  500E V  =  mean(  Mah_Ang_GDA2_500EV(:,l) ) 
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%%%%%%%%%%%%%GDA  Polynom  Deg  2  VARIOUS  NUMBER  EIGENVEC¬ 
TORS 

%%%%  GDA  Pol  deg  3 

%%%  It  has  both  Rank  &  Perf  per  Class 

load  Mah_Ang_Rank_PerfClas_GD  A_3_04 1003; 

Mah_Ang_GDA3  =  Rank_718; 
av_Mah_Ang_GDA3  =  mean(  Mah_Ang_GDA3(:,l) ) 

%%%%  GDA  Pol  deg  3 

%%%  Find  Confidence  Intervals  of  the  Rank  Means- 
%%%  Assume  GAUSSIAN  distr,  default  CI=95% 

%%%  of  the  values  of  the  Ranks 

[muhat_M,sigmahat,muci_M,sigmaci]  =  normfit(  Mah ); 
[muhat_N2,sigmahat,muci_N2,sigmaci]  =  normfit(  N2  ); 
[muhat_MN2,sigmahat,muci_MN2,sigmaci]  =  normfit(  Mah_N2  ); 
[muhat_MNl,sigmahat,muci_MNl,sigmaci]  =  normfit(  Mah_N  1  ); 
[muhat_MAn,sigmahat,muci_MAn,sigmaci]  =  normfit(  Mah_Ang_l ); 

resol  =  10  ; 
rank  =10; 
for  k  =  1  :  rank; 

CI_M(:,k)  =  ( linspace(  muci_M(l,k) ,  muci_M(2,k) ,  resol ) )'; 

CI_N2(:,k)  =  ( linspace(  muci_N2(l,k) ,  muci_N2(2,k) ,  resol ) )'; 

CI_MN2(:,k)  =  ( linspace(  muci_MN2(l,k) ,  muci_MN2(2,k) ,  resol ) )'; 
CI_MNl(:,k)  =  ( linspace(  muci_MNl(l,k) ,  muci_MNl(2,k) ,  resol ) )'; 
CI_MAn(:,k)  =  ( linspace(  muci_MAn(l,k) ,  muci_MAn(2,k) ,  resol ) )'; 
end 

figure; 

%  %  plot(  muhat_MAn  ,  V  ); 
axis([0, 25, 0.975,1]);  grid  on;  hold  on; 
title('Confidence  Intervals  of  Means  of  first  5  Rank  Scores'); 
xlabel('Rank  Score  Index') ;ylabel('Rank  Score  Mean'); 

%%%%%  Create  resolution  for  95%  Confidence  Intervals 
for  k  =  1  :  resol; 

xl(k)=l;  x2(k)=2;  x3(k)=3;  x4(k)=4;  x5(k)=5;  x6(k)=6;  x7(k)=7;  x8(k)=8;  x9(k)=9; 
xl0(k)=10;  xl  1  (k)=  11; 

xl2(k)=12;  xl3(k)=13;  xl4(k)=14;  xl5(k)=15;  xl6(k)=16;  xl7(k)=17;  xl8(k)=18; 
xl9(k)=19; 

x20(k)=20;  x21(k)=21;  x22(k)=22;  x23(k)=23;  x24(k)=24;  x25(k)=25; 
end; 

%%%%  Plot  95%  Confidence  Intervals  of  first  5  Rank  Scores 
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plot(  0  ,  CI_M(1,1),  ’b’ ,  0  ,  CI_N2(1,1),  ’g’ ,  0  ,  CI_MN2(1,1),  ’r' ,  0  ,  CI_MN1(1,1),  ’m’ , 

0  ,  CI_MAn(l,l),  ’c’  ); 

plot(  xl  ,  CI_M(:,1),  ’b*’ ,  x6  ,  CI_M(:,2),  ’b*’ ,  xl  1  ,  CI_M(:,3),  ’b*’ ,  xl6  ,  CI_M(:,4), 

’b*’ ,  x21  ,  CI_M(:,5),  V  ); 

plot(  x2  ,  CI_N2(:,1),  ’g*1 ,  x7  ,  CI_N2(:,2),  'g*' ,  xl2  ,  CI_N2(:,3),  'g*' ,  xl7  ,  CI_N2(:,4), 
•g*'  5  x22  ,  Cl  N2(:,5),  'g*'  ); 

plot(  x3  ,  CI_MN2(:,1),  'r*' ,  x8  ,  CI_MN2(:,2),  'r*' ,  xl3  ,  CI_MN2(:,3),  'r*' ,  xl8  , 
CI_MN2(:,4),  'r*' ,  x23  ,  CI_MN2(:,5),  'r*'  ); 

plot(  x4  ,  CI_MN1(:,1),  'm*' ,  x9  ,  CI_MN1(:,2),  'm*' ,  xl4  ,  CI_MN1(:,3),  'm*' ,  xl9  , 
CI_MN1(:,4),  'm*' ,  x24  ,  CI_MN1(:,5),  'm*'  ); 

plot(  x5  ,  CI_MAn(:,l),  'c*' ,  xlO  ,  CI_MAn(:,2),  'c*' ,  xl5  ,  CI_MAn(:,3),  'c*' ,  x20  , 
CI_MAn(:,4),  'c*' ,  x25  ,  CI_MAn(:,5),  'c*'  ); 
legend('CI-M',  'CI-N2',  'CI-MN2',  'CI-MN1',  'CI-MAn'  ) 

•  PCAGDAl.m: 


%% 

%%  Filename: 

%%  Thesis  Advisor: 
%%  Author: 

%% 

%%  Date: 

%%  Description: 

%% 

%% 

%%  Parameters: 

%% 

%% 

%% 

%%  Outputs: 

%% 

%%  Functions  used: 

%% 

%% 


PCAGDAl.m 

Prof.M.P.Fargues  Naval  Postgraduate  School 
Captain  Dimitrios  I.  Domboulas 
Hellenic  Air  Force 
August  2004 

Computes  classification  performance  for  the  LDA 
method  (i.e.  GDA  with  polynomial  kernel  deg-1) 
or  of  the  Fisherface  method  (PCA  +  GDA-1) 

1.  Method  to  be  used  (GDA-1  or  Fisherface) 

2.  Number  of  K  matrix  eigenvectors  kept 

3.  Distance  type  to  be  used 

4.  Number  of  iterations 

1.  Rank-(l-50)  Scores 

2.  Classification  Performance  per  Class 
buildGDA  Opt.m,  eigensystem.m,  dataST.m, 
spreadGDA  Opt.m,  pca.m,  sortem.m 


clc; 

clear; 

tic  %%  start  timing  1000  iterations 
load  A_all; 

clear  img  A  Db  Dme  x  ans  img  name  N  class  j  k  m  n  s  si  sqrsl  teml  tem2; 
clear  IndSamples  N  objects  Section  im_numl  im_num2  Person  T  time  v  w; 
Atemp  =  double(Aall); 
clear  A_all; 
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%%  CONSTANTS  ’  DECLARATION 

TrainClassSize  =18;  %%  Number  of  Training  Images  per  Class 
Test_Class_Size  =  12;  %%  Number  of  Testing  Images  per  Class 
Num_Train_Imag  =  900;  %%  Total  Number  of  Training  Images 
Num_Test_Imag  =  600;  %%  Total  Number  of  Testing  Images 
Num_Class  =  50;  %%  Total  Number  of  Classes 
cs  =  [18,18,18,18,18,18,18,18,18,18]; 

Class_Sizes  =  [cs,  cs,  cs,  cs,  cs];  %%(1X50)  vector,  with  class  sizes  per  class 
selmethod  =  1;  %%%Use  0  for  GDA-1  and  anything  else  for  Fisherface 
degree  =  1;  %%%  ALWAYS  Polynomial  kernel  deg  1  FOR  GDA-1==>LDA 
Num_ev  =  0;  %%%  input  Number  of  K  matrix  eigenvectors  (0  =>  Default) 
select_dist  =  1 ;  %%%  select  distance 

Numlter  =  1 ;  %%%  Number  of  MAIN  LOOP  ITERATIONS 

h=waitbar(0,'  MAIN  ITERATION  LOOP');  %%%%  WAITBAR 

for  loop  =  1  :  Num  lter;  %%%%%  MAIN  ITERATION  LOOP  %%%%%%%%%% 

waitbar(loop/Num_Iter  ,  h);  %%%%  WAITBAR 

%%%%%%%%%%%%%%%%%%%%%0/o0/o0/o0/o0/o%0/o%%0/o%0/o0/o%0/o0/o%%%%%% 

%%%%  Random  Permutation  Section  Borrowed  from  [Lee,  2004]  and  modified  by 
%%%%  [DOMBOULAS,  2004], 

A  =  Atemp;  %%  A  contains  the  Training  Images  (2700x900) 

B  =  [];  %%  B  contains  the  testing  images  (2700x600) 

Delete  =  []; 

for  i=l  :  150  %  i  goes  from  1  to  the  total  number  of  pictures 
rp  =  randperm(lO); 

LI  =  (i-1)  *10+1; 

L2  =  i*10; 

A( : ,  LI  :  L2  )  =  A( : ,  (i-1)  *  10  +  rp );  %%  permute  images 
Index  =  [LI  Ll+1  Ll+2  Ll+3]; 

B  =  [B  A( : ,  Index )]; 

Delete  =  [Delete  Index]; 
end 

A( : ,  Delete )  =  []; 

%%%%  Section  Borrowed  from  [Lee,  2004]  and  modified  by  [DOMBOULAS,  2004], 

%%%%0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o0/o%%% 

%%%%%  Run  GDA-1  OR  PCA  first,  then  GDA-1  (Fisherface)  %%%%%%%%%%% 
%%%%%  Run  GDA-1  %%%%%%% 
if  sel  method  ==  0; 

A  =  DataSt(  A );  %%  Normalize  Data — >  Mean  =0,  Standard  Deviation  =  1 
B  =  DataSt(  B  );  %%  Normalize  Data — >  Mean  =0,  Standard  Deviation  =  1 
Le  =  A';  %%%  Learning  Data  (900  Images  are  rows) 

Te  =  B';  %%  600  Testing  images  of  our  Data  Base  (Images  are  rows) 
[dataGDA,centeredkM,kM]  =  buildGDA_Opt(  Le  ,  Class_Sizes  ,  degree,  Num_ev ); 
%%run  GDA 
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Pgda  A  =  spreadGDA_Opt(  Le,  Le,  dataGDA,  degree );  %%  Projected  Training  images 
in  GDA 

PgdaB  =  spreadGDA_Opt(  Te,  Le,  dataGDA,  degree  );  %%  Projected  Testing  images  in 
GDA 

else  %%%%  Run  Fisherface 
%%%%%%%%  PCA 

[Wpca,m,Amean,Ad]  =  pca(A);  %%%Apply  PCA  to  IrisData 

PApca  =  Wpca'  *  (Ad);  %%  Compute  the  PCA  projection  matrix  of  IrisData 

PBpca  =  Wpca'  *  (B  -  Amean*ones(l,Num_Test_Imag)); 

%%%%%  GDA  Pol  Deg  1  ==>  LDA 

Le  =  PApca';  %%%  Learning  Data  (900  Images  are  rows)  (DO  NOT  NORMALIZE 
DATA) 

Te  =  PBpca';  %%%  600  Testing  images  of  our  Data  Base  (Images  are  rows)  (DO  NOT 
NORMALIZE  DATA) 

[dataGDA,centeredkM,kM]  =  buildGDA_Opt(  Le  ,  Class  Sizes  ,  degree,  Num  ev  ); 
%%run  GDA,  Polyn  Deg  1 

Pgda  A  =  spreadGDA_Opt(  Le,  Le,  dataGDA  ,  degree );  %%  Projected  Train  images  in 
GDA,  Polyn  Deg  1 

Pgda  B  =  spreadGDA_Opt(  Te,  Le,  dataGDA  ,  degree  );  %%  Projected  Test  images  in 
GDA,  Polyn  Deg  1 
end;  %%%  end  if 

%%%%%%%%%%%%%%%%%%%%%  PCA LDA  %%%%%%%%%%%%% 

%%%  FIND  VARIANCE  of  each  column  of  the  900X49  Projected  Train  Data  Matrix. 
%%%  Covariance  is  used  for  other  Image  Distance  types 
for  k  =  1  :  size(  Pgda  A  ,  2  ); 

Pr_Tr_Dat_Cov_Col(k)  =  cov(  Pgda_A(:,k) );  %%  Row  Vector  1X49  with  variances 
%%%  of  each  of  the  49  columns  of  the  Projected  Training  Data  Matrix  900X49 
end; 

%%%%  49X49  covariance  matrix  of  Proj  Train  Data  Matrix 
Pr_Tr_Dat_Cov_Mat  =  cov(Pgda_A); 

%%%%%%  It  is  Diagonal!! Invertible  &  Each  Dimension 's  features  are 
%%%%%%  independent  from  each  othert  ==>  It  is  the  Ideal  case 
Inv_Cov  =  inv(  Pr_Tr_Dat_Cov_Mat );  %%%  Inverse  Covar  Matr 
%%%  For  the  Approximated  Mahalanobis  Distance 
for  k  =  1  :  size(  Inv_Cov,2  ); 

s_inv(k)  =  Inv_Cov(k,k);  %%  ROW  VECTOR  1  x  49 
end; 

sr_s_inv  =  sqrt(s_inv(k));  %%  Row  Vector 

sr_inv_diag  =  diag(  sr_s_inv  );  %%Diagonal  Matrix  with  diag  elem  the 
%%%  square  roots  of  the  inverse  diag  elem  of  cov  matrix  49  x  49 
%%%%%%%%%%%%%  END  VARIANCE  SECTION  %%%%%%%%%%% 

%%  Find  Centroids  per  Each  Class  Projected  Training  Images 
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for  k=  1  :  Num_Class 

ProjTrainCentr(  k , : )  =  mean(  Pgda_A(  ((k-l)*Train_Class_Size  +  1) :  ( k  * 

TrainClassSize ),:)); 

end; 

%%%  create  cell  array 

DistName  =  {'Norm-2  ';'Mahalanobis 

'Mahalanobis  Norm-2  ';'Mahalanobis  Norm-1  ';'Mahalanobis  Angular'}; 

%%%  Find  Various  Distances  for  Testing  Images 
for  k  =  1  :  Num_Test_Imag  ; 
for  m  =  1 :  Num_Class; 

if  select_dist  ==  1;  %%  NORM-2  Distance 

TestDist(k,m)  =  norm(  Pgda_B(k,:)  -  ProjTrainCentr(m,:),2  ); 

elseif  selectdist  ==  2;  %%  MAHALANOBIS  DISTANCE 

TestDist(k,m)  =  sqrt(  (Pgda_B(k,:)  -  ProjTrainCentr(m,:))  *  Inv_Cov  *  (Pgda_B(k,:)  - 
ProjTrainCentr(m,:))' );  %%  Mahalanobis  Distance 

elseif  select  dist  ==  3;  %%  Mahalanobis  NORM-2  Distance 
temp  =  (  sr_s_inv  .*  (  Pgda_B(k,:)  -  ProjTrainCentr(m,:) ) )  *  (  sr_s_inv  .*  ( 
Pgda_B(k,:)  -  ProjTrainCentr(m,:)  )  )' ; 

TestDist(k,m)  =  sqrt(  sum(  temp  ) );  %%  Approxim  Mahalan  Dist 

elseif  select  dist  ==  4;  %%  Mahalanobis  NORM-1  Distance 
TestDist(k,m)=sr_s_inv  .*  norm(  Pgda_B(k,:)  -  ProjTrainCentr(m,:),l ); 

else  %%%%  MAHALANOBIS  ANGULAR  DISTANCE 

TestDist(k,m)  =  acos(  ( (  Pgda_B(k,:)  *  sr  inv  diag  )  *  (  sr  inv  diag  *  ProjTrain- 
Centr(m,:)' ) )  /  ( norm(  Pgda_B(k,:).*sr_s_inv  ,2  )  *  norm(  ProjTrain- 
Centr(m,:).*sr_s_inv  ,2))); 

end;  %%%  end  if 

end 

end 

%%%  Assign  1-50  Class  Indeces  for  each  of  the  600  test  Images 
T_B  =  [1,1,1, 1,1,1, 1,1, 1,1, 1,1]; 
for  m  =  1  :  Num_Class; 

T_B(  (  Test_Class_Size  *  (m-1)  +  1  ) :  (  Test_Class_Size  *  m )  )  =  m  *  T_B( 
l:Test_Class_Size) ; 

end;  %%%  T_B  contains  the  class  number  of  each  testing  image  (1X600) 

%%%  Sort  Distances  per  Testing  Image  -  RANK 
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for  k  =  1  :  Num_Test_Imag  ; 
tempi  =  sort(  TestDist(k,:) ); 

I(k)  =  find(  (tempi  ==  TestDist(k,T_B(k))*ones(l,Num_Class)) ); 
end; 

%%%  Performance  Measurement  Per  Class 
for  k  =  1  :  NumClass; 

temp2  =  fmd(  I(  ((k-l)*Test_Class_Size  +  1) :  ( k  *  Test_Class_Size ) )  ==  ones( 
l,Test_Class_Size ) ); 

Perf_Class(k)  =  size( temp2 ,2)1  Test_Class_Size; 
end; 

%%%  Find  the  probability  of  each  of  the  50  Ranks  assigned  to  all  the  testing  images 
for  k  =1  :  Num_Class  ; 

temp3  =  find(  I  ==  k  *  ones(  l,Num_Test_Imag  ) ); 

Prob(k)  =  size( temp3 ,2)1  Num  Test  lmag; 
end; 

%%%%  RANK  SCORE  Evaluation  -  Cummulative  Probability  of  Ranks  -  Increasing 
Order 

Rank(l)  =  Prob(l); 
for  k  =  2  :  Num_Class  ; 

Rank(  k )  =  Rank(  k-1 )  +  Prob(  k );  %%  Rank  is  1X50  with  accumulated  Probability 
end; 

Rank_718(loop,:)  =  Rank; 

Perf_Class_718(loop,:)  =  Perf_Class; 

clear  A  TA  B  TB  f  LI  L2  Le  Te  dataGDA  centeredkM  kM; 
clear  Pgda  A  PgdaB  k  j  Pr_Tr_Dat_Cov_Col  Pr_Tr_Dat_Cov_Mat; 
clear  TrainDist  TestDisttemp  Pgda  Aindex  Classlndex; 
clear  Inv_Cov  s  inv  sr_inv_diag  ProjTrainCentr  DistName; 
clear  PgdaB  Index  Classlndex  T_B  tempi  I  temp2  temp3; 
clear  Prob  Delete  d  diml  dim2  i  m  n  sr  s  inv; 
clear  Rank  Perf  Class  Pgda  AIndex  TestDist  cs  rp; 

%  %  save  Mah_Ang_Rank_PerfClas_GDA_0_7_040926  Rank_718  Perf_Class_7 1 8  ; 
save  Mah_Ang_Rank_PerfClas_GDA_l_041116  Rank_718  Perf_Class_718  ; 

end;  %%%%%%  END  OF  MAIN  ITERATION  LOOP  %%%%%%%%%%%% 
close(h);  %%%%%%%%%%%%%  WAITBAR 

time(loop)  =  toe;  %%  Stop  timing  each  Main  Iteration  Loop 
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Cumulative  Rank  Score 


APPENDIX  C.  RANK  SCORE  PLOTS 


In  this  appendix,  the  following  typical  random  rank  score  plots  are  provided.  Each 
was  derived  from  one  random  iteration  of  the  program  “DistanceSelection.m”,  listed  in 
Appendix  B. 


Cummulative  Rank  Score  for  600  Testing  Images 


Figure  C.  1 .  Rank  score  plot  example  1 . 
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Cumulative  Rank  Score 


1.1 


Cummulative  Rank  Score  for  600  Testing  Images 


Figure  C.2.  Rank  score  plot  example  2. 
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Cumulative  Rank  Score 


1.1 


Cummulative  Rank  Score  for  600  Testing  Images 


Figure  C.3.  Rank  score  plot  example  3. 
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